-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Support for DaemonSet eviction when draining nodes #6158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Kind of sounds like kubernetes/kubernetes#75482 :/ |
Yes, I think if kubernetes/kubernetes#75482 were implemented it could potentially be used to implement this feature request. |
/milestone Next |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen /help |
@fabriziopandini: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
This feature might be eventually supported with Declarative Node Maintenance: kubernetes/enhancements#4213 |
/priority backlog |
Do we know how cluster-autoscaler implemented this feature? In general the DaemonSet controller will add a toleration for the Unschedulable taint to all DaemonSet Pods (https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#taints-and-tolerations). So while it's possible to evict DaemonSet Pods they will just immediately be re-created (because "cordon" basically doesn't work because of the toleration). I would guess they maybe added a cluster-autoscaler-specific taint to the Node? In general it would be better if evicting DaemonSet Pods would be cleanly supported in core Kubernetes first. |
Took a quick look at autoscalers code. For me it looks like they don't handle that the daemonset controller schedules a new pod.
|
(I'm not sure if this feature request is large enough to require the CAEP process. If it is please let me know.)
User Story
As a user I would like to some mechanism to have my DaemonSet pods gracefully terminated when draining nodes for deletion so that those pods can complete their shutdown process.
Detailed Description
Currently Cluster API uses the standard kubectl drain ignoring all DaemonSets (link). I would like some way to have my DaemonSet pods also gracefully terminated as part of the node deletion process.
Anything else you would like to add:
While investigating whether this is currently possible I saw that Cluster Autoscaler provides a mechanism to control DaemonSet draining. I'm planning to make use of this in the interim but it would be nice to also have the draining happen for when nodes are not drained by Cluster Autoscaler (e.g. for cluster upgrades, etc.).
I also looked into the graceful node shutdown feature but in my case the pod drain time is quite long (could be 30 minutes or longer) and I'm not sure the feature would work for such long termination times, especially in EC2. I don't think EC2 will let you stall instance termination for such a long time. It's hard to find any documentation on how long an EC2 instance can inhibit the shutdown but I did see this saying typically 10 minutes is the max.
The other thing I saw while investigating this is that Cluster API machine deletion has a pre-terminate hook. It seems like it might be possible to implement evicting DaemonSet pods by making a custom Hook Implementing Controller (HIC). Is that the preferred way to implement something like this? If so I can close this feature request and look into making the HIC.
/kind feature
The text was updated successfully, but these errors were encountered: