WIP: change config webhook policy #3583

dsimansk · 2025-04-02T14:28:53Z

Fixes JIRA #

Proposed Changes

dsimansk · 2025-04-02T14:57:54Z

/test all

maschmid · 2025-04-02T20:38:53Z

Well, the webhook does not start without the configmap...

{"severity":"EMERGENCY","timestamp":"2025-04-02T18:05:38.461816322Z","logger":"webhook","caller":"sharedmain/main.go:284","message":"Failed to start configuration manager","commit":"7523ce2","knative.dev/pod":"webhook-6b9cbc9fc6-c8rqq","error":"configmap \"config-features\" not found","stacktrace":"knative.dev/pkg/injection/sharedmain.MainWithConfig\n\t/workspace/vendor/knative.dev/pkg/injection/sharedmain/main.go:284\nknative.dev/pkg/injection/sharedmain.MainWithContext\n\t/workspace/vendor/knative.dev/pkg/injection/sharedmain/main.go:209\nmain.main\n\t/workspace/cmd/webhook/main.go:172\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:271"}

I guess we may instead either nuke the validatingwebhookconfiguration config.webhook.serving.knative.dev if it already exists during KnativeServing reconciliation but the webhook deployment (or the CMs?) don't ...

or, force it into the initial state as it's defined in the manifest (again, only if the webhook deployment (or the CMs? ) don't exist?)

(maybe not just not existing... basically whenever we need to touch the webhook deployment (that triggers a new pod to be created), we need to reset the config.webhook.serving.knative.dev validatingwebhookconfiguration (as the webhook will update it when it starts... ) ... and we need to reset it before we touch the configmaps, if we're touching them too...

maschmid · 2025-04-02T20:49:32Z

alternatively, we may just drop the config.webhook.serving.knative.dev validatingwebhookconfiguration altogether, as we manage the CMs in the operator anyway? (and if we really need to control that, the SO operator should be the webhook that guards them then... )

(or, the logic of those validations could be done directly in the code that transforms KnativeServing .spec.config to confimaps, so the error could be directly handled in the operator... )

openshift-ci · 2025-04-03T11:57:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dsimansk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dsimansk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

maschmid · 2025-04-03T12:00:36Z

...ve-operator/cmd/openshift-knative-operator/kodata/knative-serving/latest/2-serving-core.yaml

@@ -9164,7 +9164,7 @@ webhooks:
    - key: app.kubernetes.io/component
      operator: In
      values: ["autoscaler", "controller", "logging", "networking", "observability", "tracing", "net-certmanager"]
-  timeoutSeconds: 10
+  timeoutSeconds: 30


I thought we wanted to decrease it (so that the delay is not that long if the issue occurs? )

(10s is probably enough in normal cases where the webhook is working.. )

Ops my bad then, I have misinterpreted the meaning of this timeout.

dsimansk · 2025-04-03T12:01:01Z

@maschmid per slack thread. I've changed the failurePolicy and timeout on the validating webhook.

openshift-ci · 2025-04-03T15:41:33Z

@dsimansk: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/417-test-upgrade-aws-417	`9d293ae`	link	true	`/test 417-test-upgrade-aws-417`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

skonto · 2025-04-07T09:02:14Z

@maschmid @dsimansk In the upstream operator we can deploy the webhook deployments first and then wait for them to be up. Then apply the webhook cgfs and only after that the rest of the resources. This way even during corner cases there will be no resources that can cause the "webhook not found error". Ideally you don't want to create the cms without validation. Right now we only cover certificates but as I discussed @dsimansk if cms are required for the deployments to work then we need some strict order. The reason I didn't cover CMs upstream is because I thought people would remove stuff properly before re-isntalling even when S-O is used by other operators. The certificate issue was independent because it was happening upstream with a fresh install, making reconciliation to return early with an error (although transient). I will work on that fix upstream first.

dsimansk · 2025-04-07T16:15:20Z

@skonto ins't it already done in CheckDeployments? At least that was my impression from https://github.com/knative/operator/blob/main/pkg/reconciler/common/deployments.go#L48

skonto · 2025-04-08T07:48:45Z

@dsimansk Before we call checkDeployments here https://github.com/knative/operator/blob/main/pkg/reconciler/knativeserving/knativeserving.go#L126 we call manifest.Install https://github.com/knative/operator/blob/main/pkg/reconciler/common/install.go#L43 which installs all things except mutation, validation webhook cfgs and the certificate. My proposal would be:
a) Install_wbh deployments with the related webhook cms.
b) Wait for the wbh deployments
c) Install the rest of the manifest
d) Install the webhook cfg and the dependant resources

In step a) we can remove the wbh cfg before we start installing things to avoid the webhook failure as mentioned by @maschmid.

skonto · 2025-04-08T08:29:05Z

So far we are facing the following issues:
a) fresh install emits a warning due to the certificate being deployed too early. I fixed this upstream.
b) user does a non clean uninstall and webhook cgs are left around. This would fail with a re-install due to the certificate being deployed too early. Fixed with the PR for a)
c) user does a non clean uninstall and webhook cgs are left around. This would fail with a re-install due to cms being deployed too early.

I suggest we fix c) as commented above.

openshift-ci bot added the approved label Apr 3, 2025

dsimansk force-pushed the pr/release-1.35-tests-patch branch from f26ec6d to 464607d Compare April 3, 2025 11:59

dsimansk changed the title ~~WIP: add ConfigMap to webhook dependant resources~~ WIP: change config webhook policy Apr 3, 2025

openshift-ci bot added the do-not-merge/work-in-progress label Apr 3, 2025

maschmid reviewed Apr 3, 2025

View reviewed changes

WIP: change config webhook policy

9d293ae

dsimansk force-pushed the pr/release-1.35-tests-patch branch from 464607d to 9d293ae Compare April 3, 2025 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: change config webhook policy #3583

WIP: change config webhook policy #3583

Uh oh!

dsimansk commented Apr 2, 2025

Uh oh!

dsimansk commented Apr 2, 2025

Uh oh!

maschmid commented Apr 2, 2025 •

edited

Loading

Uh oh!

maschmid commented Apr 2, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Apr 3, 2025

Uh oh!

maschmid Apr 3, 2025

Uh oh!

maschmid Apr 3, 2025

Uh oh!

dsimansk Apr 3, 2025

Uh oh!

dsimansk commented Apr 3, 2025

Uh oh!

openshift-ci bot commented Apr 3, 2025

Uh oh!

skonto commented Apr 7, 2025 •

edited

Loading

Uh oh!

dsimansk commented Apr 7, 2025

Uh oh!

skonto commented Apr 8, 2025 •

edited

Loading

Uh oh!

skonto commented Apr 8, 2025

Uh oh!

Uh oh!

WIP: change config webhook policy #3583

Are you sure you want to change the base?

WIP: change config webhook policy #3583

Uh oh!

Conversation

dsimansk commented Apr 2, 2025

Proposed Changes

Uh oh!

dsimansk commented Apr 2, 2025

Uh oh!

maschmid commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maschmid commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Apr 3, 2025

Uh oh!

maschmid Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

maschmid Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

dsimansk Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

dsimansk commented Apr 3, 2025

Uh oh!

openshift-ci bot commented Apr 3, 2025

Uh oh!

skonto commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsimansk commented Apr 7, 2025

Uh oh!

skonto commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skonto commented Apr 8, 2025

Uh oh!

Uh oh!

maschmid commented Apr 2, 2025 •

edited

Loading

maschmid commented Apr 2, 2025 •

edited

Loading

skonto commented Apr 7, 2025 •

edited

Loading

skonto commented Apr 8, 2025 •

edited

Loading