-
Notifications
You must be signed in to change notification settings - Fork 553
Remove extra registry pod when it is created #2614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove extra registry pod when it is created #2614
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: akihikokuroda The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
67589c8
to
0f2fa81
Compare
/unhold |
/hold |
0f2fa81
to
8c96a44
Compare
/unhold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this. I have a few questions/comments before we merge:
@@ -166,6 +167,7 @@ type ConfigMapRegistryReconciler struct { | |||
Lister operatorlister.OperatorLister | |||
OpClient operatorclient.ClientInterface | |||
Image string | |||
MuPod sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this mutex needs to be exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is taken out.
@@ -354,6 +361,8 @@ func (c *ConfigMapRegistryReconciler) ensureRoleBinding(source configMapCatalogS | |||
|
|||
func (c *ConfigMapRegistryReconciler) ensurePod(source configMapCatalogSourceDecorator, overwrite bool) error { | |||
pod := source.Pod(c.Image) | |||
c.MuPod.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me that adding a mutex solves the problem at hand. The queues that are used key on name/namespace and should never allow the same resource to be processed by more than one worker concurrently.
IMO, an alternative suggestion for the behavior you're seeing is that the resource was processed more than once, serially, before the cache that backs the Lister
-- which you've swapped with a client call above -- can be updated. This could have lead to cache misses, and the resource being created again.
I would feel much more comfortable if you could produce a test, unit or e2e, that fails for the original code but passes with these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review. I saw 2 pods were created so I guessed the cause of it. I'll look into more detail and capture what is happening. I'm not sure if I can create a test to reproduce consistently but I'll try it anyway.
8c96a44
to
706f4c1
Compare
It must be the cache misses. Here is the catalog operator pod log when the CI test failed. The catsrc id=wEdnr and catsrc id=iFe3m ran back to back and the catsrc id=iFe3m saw the
I made change to not to use the Lister to check the registry pod in the currentPods and currentPodsWithCorrectResourceVersion. This should make the second sync to see the registry pod created. |
706f4c1
to
1889f82
Compare
@@ -214,28 +214,36 @@ func (c *ConfigMapRegistryReconciler) currentRoleBinding(source configMapCatalog | |||
|
|||
func (c *ConfigMapRegistryReconciler) currentPods(source configMapCatalogSourceDecorator, image string) []*v1.Pod { | |||
podName := source.Pod(image).GetName() | |||
pods, err := c.Lister.CoreV1().PodLister().Pods(source.GetNamespace()).List(labels.SelectorFromSet(source.Selector())) | |||
pods, err := c.OpClient.KubernetesInterface().CoreV1().Pods(source.GetNamespace()).List(context.TODO(), metav1.ListOptions{LabelSelector: labels.SelectorFromSet(source.Selector()).String()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may technically solve the issue, but if at all possible, talking directly to the kube api for read
requests should be avoided.
The operator is already watching pods, so it will eventually know about all of these pods, and this just adds load to the apiserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for comments. OK. The second pod stays up until the next change of the catalogsource. I can probably change code to stop the second pod sooner without issuing the read requests instead preventing the second pod creation. Or if the multiple pod is not concern, I can change the e2e test not to check the single pod.
1889f82
to
f86b939
Compare
dab650b
to
bacd936
Compare
bacd936
to
e5ab894
Compare
e5ab894
to
1dc3d4c
Compare
Signed-off-by: akihikokuroda <[email protected]>
1dc3d4c
to
4c1003a
Compare
closing PR as stale. Please re-open if it's still important. |
Signed-off-by: akihikokuroda [email protected]
Description of the change:
This PR adds a mutex block around the creation of the configmap retistry pod. I reproduced this error locally a couple of times. I saw 2 registry pods created with the same configMapResourceVersion. The 2 queue process workers created the pod at the same time. This mutex block prevent 2 workers checking and creating the registry pod at the same time.
Motivation for the change:
Closes #2613
Reviewer Checklist
/doc