-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Dynamic controller scaling #2576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you describe it in more detail? |
I think there are two tasks
|
/kind feature |
I'm skeptical whether changing the number of workers during runtime is a good idea. Instead, I suggest looking into horizontally scaling controllers including some form of sharding. |
Like any other change we usually propose, this would be opt in.
Controller Runtime is focused on a single controller scenario acting as a leader for the time being; but this is probably good to document outside of this project. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen |
Hi @vincepri, We’re seeing similar issues with Spark Operator—event spikes overwhelm the controller, causing high latencies and timeouts for our time-sensitive batch workloads. Are you proposing dynamically adjusting MaxConcurrentReconciles based on queue depth and reconciliation latency, or modifying controller thread scaling more broadly? Would love to understand the approach and potentially contribute in this area |
I would just use #2374 |
Thanks @sbueringer - to use that feature, is that just a boolean set, while initializing the controller ? or we also are expected to define priority levels, and handle priority assignments to events ? |
Currently we allow to specify a fixed number of nodes for each controller.
After attending the talk at KubeCon on how to scale Cluster API to 2k clusters (link tba), it's be good to allow controller runtime to spin up and down workers dynamically based on objects in the queue, and on the 90th percentile of the overall duration of the reconciler.
The text was updated successfully, but these errors were encountered: