Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNTRLPLANE-78: Move Group informer configuration to RestrictSubjectBindings plugin initialization #2157

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (
"k8s.io/apiserver/pkg/admission/initializer"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/cache"
"k8s.io/klog/v2"
"k8s.io/kubernetes/pkg/apis/rbac"

Expand Down Expand Up @@ -87,6 +88,12 @@ func (q *restrictUsersAdmission) SetRESTClientConfig(restClientConfig rest.Confi
}

func (q *restrictUsersAdmission) SetUserInformer(userInformers userinformer.SharedInformerFactory) {
if err := userInformers.User().V1().Groups().Informer().AddIndexers(cache.Indexers{
usercache.ByUserIndexName: usercache.ByUserIndexKeys,
}); err != nil {
utilruntime.HandleError(err)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we're handling errors in other places in this file, so this seems correct because this is an admission plugin, i.e., it's a non-user-facing error.

}
q.groupCache = usercache.NewGroupCache(userInformers.User().V1().Groups())
}

Expand Down
7 changes: 0 additions & 7 deletions openshift-kube-apiserver/openshiftkubeapiserver/patch.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,7 @@ import (
clientgoinformers "k8s.io/client-go/informers"
corev1informers "k8s.io/client-go/informers/core/v1"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/cache"
"k8s.io/kubernetes/openshift-kube-apiserver/admission/authorization/restrictusers"
"k8s.io/kubernetes/openshift-kube-apiserver/admission/authorization/restrictusers/usercache"
"k8s.io/kubernetes/openshift-kube-apiserver/admission/autoscaling/managednode"
"k8s.io/kubernetes/openshift-kube-apiserver/admission/autoscaling/managementcpusoverride"
"k8s.io/kubernetes/openshift-kube-apiserver/admission/scheduler/nodeenv"
Expand Down Expand Up @@ -176,11 +174,6 @@ func newInformers(loopbackClientConfig *rest.Config) (*kubeAPIServerInformers, e
OpenshiftUserInformers: userinformer.NewSharedInformerFactory(userClient, defaultInformerResyncPeriod),
OpenshiftConfigInformers: configv1informer.NewSharedInformerFactory(configClient, defaultInformerResyncPeriod),
}
if err := ret.OpenshiftUserInformers.User().V1().Groups().Informer().AddIndexers(cache.Indexers{
usercache.ByUserIndexName: usercache.ByUserIndexKeys,
}); err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the enhancement, you have this:

- `Group` informer creation and configuration is moved into the `authorization.openshift.io/RestrictSubjectBindings` admission plugin initialization process

From what I understand from your PR, you are not moving the creation of the informer, only the indexer is being added in the admission plugin's informer instead. I assume that's what you meant by "configuration" (i.e., the indexer).

However, in the PR description, you said:

(...)
This is necessary to prevent the startup of an informer for the Group API when the plugin is disabled
(...)

So I'm wondering how this is preventing the startup of the informer. It seems to me that it's still being started as before. Am I missing anything?

Copy link

@benluddy benluddy Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object passed to the admission plugin initializer is an informer factory, and it's the chained call to Informer() that both creates and registers a new informer if needed (and the informer is later started by SharedInformerFactory's Start). So moving that call to plugin initialization instead of run-always should actually work. This all assumes that the one call being moved is the only place a User Group informer is being requested.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// InformerFor returns the SharedIndexInformer for obj using an internal
// client.
func (f *sharedInformerFactory) InformerFor(obj runtime.Object, newFunc internalinterfaces.NewInformerFunc) cache.SharedIndexInformer {
f.lock.Lock()
defer f.lock.Unlock()
informerType := reflect.TypeOf(obj)
informer, exists := f.informers[informerType]
if exists {
return informer
}
resyncPeriod, exists := f.customResync[informerType]
if !exists {
resyncPeriod = f.defaultResync
}
informer = newFunc(f.client, resyncPeriod)
informer.SetTransform(f.transform)
f.informers[informerType] = informer
return informer
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the only place for that particular SharedInformerFactory that it was being called

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I see. Thanks for the explanation. So the factory is created in the run-always path, and the informers that are requested are started there as well. However, the creation of the informer and its configuration need to be done in the plugin.

I'm not familiar with this, and maybe that's how it's typically done, but this "shared responsibility" over the informer sounds error-prone to me. Is there a way to keep where it is, but run it conditionally?

Regardless, if we're going to do this, I'd suggest adding a comment in the plugin initialization stating that. Also, do we have a job proving this is working as intended?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to keep where it is, but run it conditionally?

This is typical. Take a look at all of the implementations of this:

SetExternalKubeInformerFactory(informers.SharedInformerFactory)

The part that is atypical is having only one thing that requires a Group informer, and that it happens to be an admission plugin that is not always enabled. I agree there should be some job demonstrating that this is doing what we expect (are we looking to see that there are group watch 404s from kube-apiserver in an E2E job that enables and disables external OIDC without this, and that those requests disappear with this patch?).

Is there something we could add to CI that would tell us when we have perma-unstarted informers? I know that it's one of the /readyz checks to wait for all the Kube shared informers to start (https://github.com/openshift/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/config.go#L960-L976). I guess we can't do the same for resources served by the aggregation layer without having a chicken-and-egg problem. Still, we should emit some indication that can be wired to a monitor test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not an existing CI job to test this behavior, but is one that is intended to be created. If this PR is not mergeable until such a job exists I'm happy to place this PR on hold until then.

I have manually tested this based on https://issues.redhat.com/browse/OCPBUGS-45460 and verified that disabling the admission plugin does not result in seeing the same reflector errors being logged. I know manual testing isn't a good substitute for a repeatable CI job, but thought it was worth mentioning that I least stood up a cluster and manually tested this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't critical, and the BYO OIDC functionality is now slated for TP in 4.19, so this PR can wait until we've got more progress done on the testing front for that feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is typical. Take a look at all of the implementations of this:

Thank you, now I see.

Another question: don't we need to wait for the cache to warm up?

diff --git i/openshift-kube-apiserver/admission/authorization/restrictusers/restrictusers.go w/openshift-kube-apiserver/admission/authorization/restrictusers/restrictusers.go
index 4dea00e61a4..e37ef8c0ff0 100644
--- i/openshift-kube-apiserver/admission/authorization/restrictusers/restrictusers.go
+++ w/openshift-kube-apiserver/admission/authorization/restrictusers/restrictusers.go
@@ -88,13 +88,16 @@ func (q *restrictUsersAdmission) SetRESTClientConfig(restClientConfig rest.Confi
 }
 
 func (q *restrictUsersAdmission) SetUserInformer(userInformers userinformer.SharedInformerFactory) {
-	if err := userInformers.User().V1().Groups().Informer().AddIndexers(cache.Indexers{
+	groupInformer := userInformers.User().V1().Groups()
+	if err := groupInformer.Informer().AddIndexers(cache.Indexers{
 		usercache.ByUserIndexName: usercache.ByUserIndexKeys,
 	}); err != nil {
 		utilruntime.HandleError(err)
 		return
 	}
-	q.groupCache = usercache.NewGroupCache(userInformers.User().V1().Groups())
+	q.groupCache = usercache.NewGroupCache(groupInformer)
+
+	q.SetReadyFunc(groupInformer.Informer().HasSynced)
 }
 
 // subjectsDelta returns the relative complement of elementsToIgnore in
@@ -129,6 +132,10 @@ func (q *restrictUsersAdmission) Validate(ctx context.Context, a admission.Attri
 		return nil
 	}
 
+	if !q.WaitForReady() {
+		return admission.NewForbidden(a, fmt.Errorf("not yet ready to handle request"))
+	}
+
 	// Ignore all operations that correspond to subresource actions.
 	if len(a.GetSubresource()) != 0 {
 		return nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's super indirect, but that appears to be already happening via the GroupCache... via newRoleBindingRestrictionContext. I wouldn't expect this PR to affect whether or not that's working as intended.


return ret, nil
}
Expand Down