-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: change config webhook policy #3583
base: release-1.35
Are you sure you want to change the base?
WIP: change config webhook policy #3583
Conversation
/test all |
Well, the webhook does not start without the configmap...
I guess we may instead either nuke the validatingwebhookconfiguration config.webhook.serving.knative.dev if it already exists during KnativeServing reconciliation but the webhook deployment (or the CMs?) don't ... or, force it into the initial state as it's defined in the manifest (again, only if the webhook deployment (or the CMs? ) don't exist?) (maybe not just not existing... basically whenever we need to touch the webhook deployment (that triggers a new pod to be created), we need to reset the config.webhook.serving.knative.dev validatingwebhookconfiguration (as the webhook will update it when it starts... ) ... and we need to reset it before we touch the configmaps, if we're touching them too... |
alternatively, we may just drop the config.webhook.serving.knative.dev validatingwebhookconfiguration altogether, as we manage the CMs in the operator anyway? (and if we really need to control that, the SO operator should be the webhook that guards them then... ) (or, the logic of those validations could be done directly in the code that transforms KnativeServing .spec.config to confimaps, so the error could be directly handled in the operator... ) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dsimansk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f26ec6d
to
464607d
Compare
@@ -9164,7 +9164,7 @@ webhooks: | |||
- key: app.kubernetes.io/component | |||
operator: In | |||
values: ["autoscaler", "controller", "logging", "networking", "observability", "tracing", "net-certmanager"] | |||
timeoutSeconds: 10 | |||
timeoutSeconds: 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we wanted to decrease it (so that the delay is not that long if the issue occurs? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(10s is probably enough in normal cases where the webhook is working.. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops my bad then, I have misinterpreted the meaning of this timeout.
@maschmid per slack thread. I've changed the |
464607d
to
9d293ae
Compare
@dsimansk: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@maschmid @dsimansk In the upstream operator we can deploy the webhook deployments first and then wait for them to be up. Then apply the webhook cgfs and only after that the rest of the resources. This way even during corner cases there will be no resources that can cause the "webhook not found error". Ideally you don't want to create the cms without validation. Right now we only cover certificates but as I discussed @dsimansk if cms are required for the deployments to work then we need some strict order. The reason I didn't cover CMs upstream is because I thought people would remove stuff properly before re-isntalling even when S-O is used by other operators. The certificate issue was independent because it was happening upstream with a fresh install, making reconciliation to return early with an error (although transient). I will work on that fix upstream first. |
@skonto ins't it already done in |
@dsimansk Before we call checkDeployments here https://github.com/knative/operator/blob/main/pkg/reconciler/knativeserving/knativeserving.go#L126 we call manifest.Install https://github.com/knative/operator/blob/main/pkg/reconciler/common/install.go#L43 which installs all things except mutation, validation webhook cfgs and the certificate. My proposal would be: In step a) we can remove the wbh cfg before we start installing things to avoid the webhook failure as mentioned by @maschmid. |
So far we are facing the following issues: I suggest we fix c) as commented above. |
Fixes JIRA #
Proposed Changes