Skip to content

Validation to ensure that the CRD exists – Handling the non-existent CRD scenario. #2141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rahtr opened this issue Aug 21, 2024 · 2 comments
Assignees
Labels
area/runtime Issues or PRs as related to controller runtime, common reconciliation logic, etc kind/enhancement Categorizes issue or PR as related to existing feature enhancements.

Comments

@rahtr
Copy link

rahtr commented Aug 21, 2024

Describe the bug
When the CRD doesn't exist, the controller errors out. This behavior is expected from the application. However, from a platform's perspective, there are scenarios where CRDs are enabled by platform operators, and controllers are deployed by ACK controller admins. Though an edge case, there could be instances where the CRDs enabled by platform operators lag behind the controller version, or scenarios where platform operators only want to enable certain versions on the platform. In this situation, the controller would remain in an error state since it wouldn't find the required CRD.

For example, consider the ElastiCache release version v0.0.29, which didn't have the CacheCluster CR, only added in version v0.1.0.

{"level":"error","ts":"2024-08-21T13:57:27.722Z","logger":"controller-runtime.source.EventHandler","msg":"if kind is a CRD, it should be installed before calling Start","kind":"CacheCluster.elasticache.services.k8s.aws","error":"no matches for kind \"CacheCluster\" in version \"elasticache.services.k8s.aws/v1alpha1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:63\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func2\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:87\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:88\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33\nsigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56"}

Steps to reproduce

Expected outcome
Is there a possibility to add an ignore CRD check in the runtime?

Environment

  • Kubernetes version 1.28
  • Using EKS (yes/no), if so version? yes
  • AWS service targeted (S3, RDS, etc.): We observed this on the Elasticache controller but it's valid for other controllers as well.
@rahtr rahtr changed the title Controller stops working due to retro fit changes - Example Elasticache Validation to ensure that the CRD exists – Handling the non-existent CRD scenario. Aug 23, 2024
@a-hilaly a-hilaly added kind/enhancement Categorizes issue or PR as related to existing feature enhancements. area/runtime Issues or PRs as related to controller runtime, common reconciliation logic, etc labels Aug 27, 2024
@gecube
Copy link

gecube commented Sep 4, 2024

Hi!

What is the difference with #2007 ? So this ticket is enhancing that one.

Is there a possibility to add an ignore CRD check in the runtime?

It means that then we need to disable handling of particular crds in controller... otherwise it will silently ignore some crds and the operator of cluster won't know that some functions are not ... let's say enabled.

Also I want to say that from my perspective it would be a very rare case when somebody (not platform operator!) should install ACK toolkit. It looks like that only platform operators should install the ACK operators in the cluster and follow strict procedure (CRDs before, the operator itself after).

@a-hilaly a-hilaly self-assigned this Jan 16, 2025
@rushmash91
Copy link
Member

Hi @rahtr ,

The controller-runtime package creates watch streams for resources it needs to manage. When initializing these watches during controller startup, it directly queries the Kubernetes API server to establish subscriptions for specific resource kinds. If a CRD does not exist, this fundamental initialization step fails, preventing the controller from starting properly.

if kind is a CRD, it should be installed before calling Start","kind":"CacheCluster.elasticache.services.k8s.aws","error":"no matches for kind \"CacheCluster\

Controller-runtime is designed with the assumption that all CRDs must be available before controller initialization. This behavior is actually by design, as confirmed by a maintainer in a GitHub issue about controller recovery from missing CRDs:

"This timeout was deliberately added, because before that, it would just silently not work"

Previously, controllers would silently fail without indicating the actual problem, making troubleshooting difficult. The current behavior with explicit errors and timeouts makes the problem obvious.

We plan to support something—it would be great to get community feedback—about providing a scope to only have reconcilers for the CRDs that the admin decides to enable. However, that would not address this specific case since the issue arises because the admins would not be aware of the CRDs in the platform teams have enable. This edge case will remain.

what do you think @a-hilaly @michaelhtm @gecube ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime Issues or PRs as related to controller runtime, common reconciliation logic, etc kind/enhancement Categorizes issue or PR as related to existing feature enhancements.
Projects
None yet
Development

No branches or pull requests

4 participants