Skip to content

Fix improper controller-runtime cache configuration #640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

dgrove-oss
Copy link
Collaborator

@dgrove-oss dgrove-oss commented Dec 14, 2024

When AppWrappers are enabled, it is not correct to configure the controller-runtime cache with a filter that only allows services, secrets, etc with the RayCluster label to be cached. This breaks any AppWrapper that contains one of these resource kinds.

@dgrove-oss
Copy link
Collaborator Author

My understanding is that Kubernetes does not support OR for label selectors, so the only option is to disable the filtering entirely. If it is possible to do a filter with an OR of the RayCluster label and the AppWrapper label that would also be an acceptable fix.

@dgrove-oss
Copy link
Collaborator Author

@varshaprasad96 -- this misconfiguration is the root cause of the problem with AppWrappers containing Services we were describing to you a couple weeks back. At the time, we thought it was an issue in the AppWrapper operator when the AppWrapper contained multiple resources, but that turned out to be incorrect. The problem was the filtering of the controller-runtime cache by the codeflare operator which led the AppWrapper controller to get back a NotFound error when it tried to get a Service it had created that didn't have the RayCluster label.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
When AppWrappers are enabled, it is not correct to configure
the controller-runtime cache with a filter that only allows
services, secrets, etc with the RayCluster label to be cached.
This breaks any AppWrapper that contains one of these resource kinds.
@dgrove-oss
Copy link
Collaborator Author

rebased to pick up codeflare common fixes in #641

Copy link
Contributor

@varshaprasad96 varshaprasad96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to add tests to ensure controller behaves as expected with this cache configuration. Can do it later in follow up.

/lgtm

Copy link

openshift-ci bot commented Dec 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: varshaprasad96

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dgrove-oss
Copy link
Collaborator Author

dgrove-oss commented Dec 16, 2024

It would be helpful to add tests to ensure controller behaves as expected with this cache configuration. Can do it later in follow up.

💯 -- one possibility would be to try to run the existing appwrapper controller end-to-end test suite. I can take a look at that if the idea seems reasonable.

@openshift-merge-bot openshift-merge-bot bot merged commit 9c96f44 into project-codeflare:main Dec 16, 2024
8 checks passed
@dgrove-oss dgrove-oss deleted the fix-filter branch December 16, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants