-
Notifications
You must be signed in to change notification settings - Fork 58
Add a check for head pod imagePullSecrets #601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a check for head pod imagePullSecrets #601
Conversation
@Ygnas Can you provide unit tests for the change? |
/lgtm I've tested it multiple times and never encountered the issue. |
/lgtm |
if len(sa.ImagePullSecrets) == 0 { | ||
sa.ImagePullSecrets = []corev1.LocalObjectReference{ | ||
{Name: "test-image-pull-secret"}, | ||
} | ||
_, err = k8sClient.CoreV1().ServiceAccounts(namespaceName).Update(ctx, sa, metav1.UpdateOptions{}) | ||
if err != nil { | ||
return err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code doesn't belong into eventually check.
This code should be executed once SA is available.
Just a generic note - try to avoid conditions in tests if possible. In this case you can i.e. check that ImagePullSecrets
is empty, as a next step you can put there a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I think, I made it way better now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, one point of clarification
if err := r.deleteHeadPodIfMissingImagePullSecrets(ctx, cluster); err != nil { | ||
return ctrl.Result{RequeueAfter: requeueTime}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not occur in WorkerPods too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically yes. Practically, the worker Pods init container that waits for the head Pod to become ready makes it so it does not happen. I'd be inclined to keep the logic simple at the moment and iterate if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: astefanutti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
133db94
into
project-codeflare:main
missingSecrets := map[string]bool{} | ||
for _, secret := range serviceAccount.ImagePullSecrets { | ||
missingSecrets[secret.Name] = true | ||
} | ||
for _, secret := range headPod.Spec.ImagePullSecrets { | ||
delete(missingSecrets, secret.Name) | ||
} | ||
if len(missingSecrets) > 0 { | ||
if err := r.kubeClient.CoreV1().Pods(headPod.Namespace).Delete(ctx, headPod.Name, metav1.DeleteOptions{}); err != nil { | ||
return fmt.Errorf("failed to delete head pod: %w", err) | ||
} | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think i fully understand this part:
what would happen if the serviceAccount has multiple ImagePullSecrets? then it can get into an infinitely reconcile by deleting headPod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zdtsw I don't think it should, I just tried adding extra ImagePullSecrets into the SA, the head was deleted and recreated with the extra ones. I think all the ImagePullSecrets always end up in the headPod.
Issue link
https://issues.redhat.com/browse/RHOAIENG-9247
What changes have been made
Added a check for imagePullSecrets in the head pod, and if it does not exists delete the head pod, so it can be redeployed.
Verification steps
Set up an integrated OpenShift container registry and upload an image to it:
https://github.com/opendatahub-io/distributed-workloads/tree/main/images/runtime/examples#readme
Use the uploaded image inside the notebooks, head pod should not enter the ImagePullBackOff state (happens only frequently but not always).
Checks