-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Introduce Node Lifecycle WG #8396
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: atiratree The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
Looks like I'm not a member of kubernetes org anymore. I was a few years back, but didn't keep up with contributions recently. You can remove me as a lead and I can reapply after some contributions to this WG. |
75e1096
to
a19a192
Compare
We have had impactful conversations with Ryan about this group and its goals. He has experience with cluster maintenance and I look forward to his participation in the WG. |
/cc |
a19a192
to
2d6ac13
Compare
/cc |
/cc |
/cc |
/cc |
projects independently addressing similar issues. The goal of this working group is to develop | ||
unified APIs that the entire ecosystem can depend on, reducing the maintenance burden across | ||
projects and addressing scenarios that impede node drain or cause improper pod termination. Our | ||
objective is to create easily configurable, out-of-the-box solutions that seamlessly integrate with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the goal is APIs to use by solutions or implement a solution? These two sentences seems to be at odds. Maybe mention that k8s has no plans blocking customers implementing advanced use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to do both :) I have added another sentence there to explain it better.
|
||
### In scope | ||
|
||
- Explore a unified way of draining the nodes and managing node maintenance by introducing new APIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a goal of migrating existing scenarios to the new API so the group will be tasked to not break users when they are upgrading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have that in scope under
Migrate users of the eviction based kubectl-like drain (kubectl, cluster autoscaler, karpenter), and other scenarios to use the new approach.
So far it is pretty generic until we have a clearer vision. Please let me know if you would like to see something more specific.
shutdown. This then impacts both the node and pod autoscaling, load balancing, and the applications | ||
running in the cluster. All of these areas have issues and would benefit from a unified approach. | ||
|
||
### In scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another goal should include making Pods work reliably while terminating. This is important since with prolifiration of non-live migratable VMs with accelerators, we see more and more situations when maintanence-caused termination should be taking hours if not days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I have added a two more stories. All in all, the In scope section covers this in general, I hope.
public and private cloud providers, Kubernetes distribution providers, | ||
and cloud provider end-users. Here are some user stories: | ||
|
||
- As a cluster admin I want to have a simple interface to initiate a node drain/maintenance without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a goal to explore the scenario of getting the historical perspective on why the node was terminated/drained/killed. This comes up very often and maybe we can help those scenarios in this WG. Various ideas like Node object "thumbstones" were discussed in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure I fully understand this. I have added a new point to the In scope section that mentions this. Feel free to write a GitHub suggestion.
wg-node-lifecycle/charter.md
Outdated
accelerators; it's far too expensive. It is more cost-effective to coordinate a drain and then | ||
upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a strong statement. Maybe we can say it more generically: Investigate the most cost effective ways to upgrade nodes with the expensive accelerators deployed on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The blue-green upgrade with accelerators use-case I think is an important one to mention in some way. Drain and upgrade always have a cost - money, time, complexity, ect.... We're trying to say that the current ecosystem of APIs and Tools, or lack thereof, cause solutions to be more "expensive" than it should be. We can rephrase to emphasize this.
Perhaps it would be better to focus on some specific examples showing the "cost". E.g. some people want to keep specific accelerators they have been using because of ware and tear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern mostly about wording. Even more cost effective is to force kill and recreate. Drain is not always the best path, depending on the workload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, we have changed the wording.
|
||
Area we expect to explore: | ||
|
||
- An API to express node drain/maintenance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem I feel we will need to address is how to transition existing drain logic in various components to this new API. Having a new API without migrating old ways to this new API create "yet another" way to do it and requires end user to understand more draining logics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still no upgrade path, since we have not even agreed on the solution. We don't want to break people using the current approaches. The main incentive to switch should be painless upgrades/maintenance and other benefits.
I expect that the main components/users that use the kubectl(-like) drain should not have a hard time using the new solution(s). However, I am not sure what it will look like for the GNS, for example.
|
||
Area we expect to explore: | ||
|
||
- An API to express node drain/maintenance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why graceful termination that mentioned above is not listed here? Mosty curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I have fixed that. I am also open to other KEPs/documents that people would like to include here.
@atiratree I'm Fabrizio Pandini from SIG Cluster Lifecycle and I just saw this proposal. |
27d96ba
to
185b98b
Compare
@fabriziopandini I will not be present at KubeCon, but feel free to connect with @rthallisey. We also plan to attend the SIG Cluster Lifecycle meeting after KubeCon. |
@fabriziopandini I will be there. For anyone else that will be at Kubecon eu next week that wants to have a high-level discussion in-person, please reach out to me on slack (rhallisey) or email ([email protected]). I'll do my best to connect with anyone interested. |
@rthallisey by any chance will you also be at Maintainer Summit? |
@hakman, yes I'll be at the Maintainer Summit |
users can use the same APIs or configurations across the board. | ||
- Migrate users of the eviction based kubectl-like drain (kubectl, cluster autoscaler, karpenter, | ||
...) and other scenarios to use the new approach. | ||
- Explore possible scenarios behind the reason why the node was terminated/drained/killed and how to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like everyone is solving the problem of node maintenance independently and building private in house solutions. Improving the drain behavior is one aspect of maintenance (generally the first step after detection). There's additional steps once a node is ready to be acted on that everyone seems to have an in house solution for (especially for people serving accelerated infra).
An example might be a system that drains the node and then reboots it when a GPU fault is detected. That's just one example, the system should be able to take arbitrary actions based on various signals after waiting for a signal the node is all good to work on. Maybe some controller like "when you see X state create arbitrary CR Y" then users can extend the controller for Y to take whatever remediation action they want such as reboot / reset gpu drivers / reset NICs / etc
It seems like it would be good to come up with a community solution for how to take these actions after a node is drained and ready to be worked on. Thoughts on including this in the wg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we want to include these considerations in the WG. We imply these in our goals, but I have added your suggestion as an additional user story to make it clearer.
@rthallisey please open a org membership request so we can add you here! ![]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i really like the direction this is going, i have a question and a suggestion.
support advanced use cases across the ecosystem. | ||
|
||
To properly solve the node drain, we must first understand the node lifecycle. This includes | ||
provisioning/sunsetting of the nodes, PodDisruptionBudgets, API-initiated eviction and node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we specifically include topology spread constraints in this list as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheduling is certainly an important part as well. I have added a mention of scheduling constraints to our goals.
### Out of scope | ||
|
||
- Implementing cloud provider specific logic, the goal is to have high-level API that the providers | ||
can use, hook into, or extend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
- As a user, I want my application to finish all network and storage operations before terminating a | ||
pod. This includes closing pod connections, removing pods from endpoints, writing cached writes | ||
to the underlying storage and completing storage cleanup routines. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think another user story is around the use of ephemeral low cost instances on cloud providers. eg
As a cluster admin, I would like to use a mixture of on-demand and temporary spot instances in my clusters to reduce cloud expenditure. Having more reliable lifecycle and drain mechanisms for nodes will improve cluster stability in scenarios where instances may be terminated by the cloud provider due to cost-related thresholds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this story is also important to have. Thanks!
|
||
### In scope | ||
|
||
- Explore a unified way of draining the nodes and managing node maintenance by introducing new APIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth including something about DRA device taints/drains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems relevant to me as this affects the pod and device/node lifecycle. @pohly what do you think about including and discussing kubernetes/enhancements#5055 in the WG?
8bd9373
to
f1fe43f
Compare
Co-authored-by: Ryan Hallisey <[email protected]>
f1fe43f
to
0d4e43a
Compare
No description provided.