-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/controller: Fix panic when creating cluster-scoped RBAC in OG controller #2349
pkg/controller: Fix panic when creating cluster-scoped RBAC in OG controller #2349
Conversation
09d1e58
to
bf99831
Compare
Evan had pointed out that this implementation may be another regression from fixing the original issue in slack. /hold |
bf99831
to
f2adb2d
Compare
/hold cancel |
/hold |
f2adb2d
to
579c7dd
Compare
if crb != nil && ownerutil.IsOwnedByLabel(crb, csv) { | ||
continue | ||
} | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this error be returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to return an error if we want !IsOwnedByLabel
to cause a resync (i.e. to retry this logic).
if crb != nil && ownerutil.IsOwnedByLabel(crb, csv) { | ||
continue | ||
} | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to return an error if we want !IsOwnedByLabel
to cause a resync (i.e. to retry this logic).
if !k8serrors.IsAlreadyExists(err) { | ||
return err | ||
} | ||
// if the CR already exists, but the label is correct, the cache is just behind | ||
if k8serrors.IsAlreadyExists(err) && ownerutil.IsOwnedByLabel(cr, csv) { | ||
if cr != nil && ownerutil.IsOwnedByLabel(cr, csv) { | ||
continue | ||
} else { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the @ecordell 's original patch, effectively returns any error if !IsOwnedByLabel
, which this code does not. Assuming errors returned from this function trigger a resync -- i.e. a retry -- then we'll fail to retry the operation when we detect the cluster state isn't as expected.
I suspect something like the following will produce the desired results:
if k8serrors.IsAlreadyExists(err) && cr != nil && ownerutil.IsOwnedByLabel(cr, csv) {
continue
}
return err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I had played around with this exact implementation, we had reverted back to the original testing flake behavior (e.g. "crb-role already exists") as there's likely a larger issue in the fake testing client where the resource returned from the Create call is always going to be nil. With that said, it's likely we don't need to nil check the variable returned by Create as this is a testing client problem. Any opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving some of this from slack:
I had tried adding a client-go testing reactor for returning an object for these situations:
+// WithOperatorGroupReactors returns a fakeK8sClientOption that configures a Clientset to return the ClusterRole
+// or ClusterRoleBinding object during a create call.
+// Note(tflannag): Fix for https://github.com/operator-framework/operator-lifecycle-manager/issues/2091.
+func WithOperatorGroupReactors(tb testing.TB) Option {
+ return func(c ClientsetDecorator) {
+ c.PrependReactor("create", "clusterrolebindings", func(action clitesting.Action) (bool, runtime.Object, error) {
+ createAction := action.(clitesting.CreateActionImpl)
+ return false, createAction.Object, nil
+ })
+ c.PrependReactor("create", "clusterroles", func(action clitesting.Action) (bool, runtime.Object, error) {
+ createAction := action.(clitesting.CreateActionImpl)
+ return false, createAction.Object, nil
+ })
+ }
+}
But it wasn't immediately clear how to best integrate those reactors into the current fake client setup such that we're not always appending these reactors and increasing the time it takes to run unit tests, which likely indicates it's more useful as a one-off functional type setup:
- k8sClientFake := k8sfake.NewSimpleClientset(config.k8sObjs...)
+ k8sClientFake := fake.NewClientSetDecoratorWithReactors(config.k8sObjs, config.fakeClientOptions...)
k8sClientFake.Resources = apiResourcesForObjects(append(config.extObjs, config.regObjs...))
- config.operatorClient = operatorclient.NewClient(k8sClientFake, apiextensionsfake.NewSimpleClientset(config.extObjs...), apiregistrationfake.NewSimpleClientset(config.regObjs...))
+ config.operatorClient = operatorclient.NewClient(k8sClientFake.Clientset, apiextensionsfake.NewSimpleClientset(config.extObjs...), apiregistrationfake.NewSimpleClientset(config.regObjs...))
And at the call site creating a new fake client:
...
+ withFakeClientOptions(clientfake.WithOperatorGroupReactors(t)),
...
I tried running down this route but got sidetracked with other work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it wasn't immediately clear how to best integrate those reactors into the current fake client setup such that we're not always appending these reactors and increasing the time it takes to run unit tests, which likely indicates it's more useful as a one-off functional type setup
imho this reactor is generally useful for all unit tests, but I'm not sure the error from the create action would be carried forward properly by the existing decorator even if the example you gave was added.
When I had played around with this exact implementation, we had reverted back to the original testing flake behavior (e.g. "crb-role already exists")
this is expected in the implementation with a bad client fake, but it should no longer panic at test time, and should do the correct thing at runtime; i.e. return an error -- even if it's not the "right" error -- when the cluster has a different view of the cluster.
…troller Fixes [operator-framework#2091](operator-framework#2091). This is a follow-up to [operator-framework#2309](operator-framework#2309) that attempted to fix the original issue. When checking whether the ClusterRole/ClusterRoleBinding resources already exist, we're also checking whether the existing labels are owned by the CSV we're currently handling. When accessing the "cr" or "crb" variables that the Create calls output, a panic is produced as we're attempting to access the meta.Labels key from those resources, except those resources themselves are nil. Update the check to verify that the cr/crb variables are not nil before attempting to access those object's labels. The testing fake client may need to be updated in the future to handle returning these resources properly. Signed-off-by: timflannagan <[email protected]>
579c7dd
to
ad27f33
Compare
/hold cancel |
/approve |
clockFake := utilclock.NewFakeClock(time.Date(2006, time.January, 2, 15, 4, 5, 0, time.FixedZone("MST", -7*3600))) | ||
now := metav1.NewTime(clockFake.Now().UTC()) | ||
const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had this const somewhere in the code base already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dinhxuanvu, kevinrizza, timflannagan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description of the change:
Fixes #2091.
This is a follow-up to operator-framework#2309 that attempted to fix the original issue. When checking whether the ClusterRole/ClusterRoleBinding resources already exist, we're also checking whether the existing labels are owned by the CSV we're currently handling. When accessing the "cr" or "crb" variables that the Create calls output, a panic is produced as we're attempting to access the meta.Labels key from those resources, except those resources themselves are nil. Update the check to verify that the cr/crb variables are not nil before attempting to access those object's labels. The testing fake client may need to be updated in the future to handle returning these resources properly.
I ran
go test ./pkg/controller/operators/olm/... -v -run ^TestSyncOperatorGroups$ -count 250
a couple of times and wasn't able to reproduce. I was able to consistently reproduce without these changes using a much smaller count sizing before.Motivation for the change:
Reduce testing flakes.
Reviewer Checklist
/doc