-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Node should be deleted after infraMachine is gone #2565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
👍 this has been in my head for a while but I never got around to filing. Thank you! This will also speed up deletion, because right now when we remove the |
Should the machine controller be deleting the Node at all? I thought this was the responsibility of the K8s cloud controller manager? From the K8s docs on cloud-controller-manager
My expectation of how this worked would have been:
I think the important part of this is that the Node is removed after kubelet has stopped as @ncdc suggested. Is there a reason we can't rely on the cloud controller manager for this and have to do it ourselves? (I'm guessing something to do with many providers, not all of them implementing the node deletion part of CCM?) EDIT: I see this was added in #809 to ensure that nodes are cleaned up if there is no CCM |
I see Cloud controller manager as something orthogonal here. We want to provide the same provider agnostic CAPI semantics/behaviour/UX and not every environment is guaranteed to have a Cloud controller manager nor to delete the node, e.g libvirt, baremetal... |
Just to echo some of the points above (and expand a bit) on the reasons we added the Node deletion. It was indeed to remain consistent in cases where a provider may not have an associated CCM, but also to provide consistency in cases where there were different behaviors exhibited by CCMs for different providers. I believe maintaining consistency here is important. We can definitely talk about trying to get to a point in the future where there is a requirement to have a CCM for a given infrastructure provider, however we would also have to make sure that there is consistent behavior between them as well. |
/assign @enxebre |
If there were an edge case where we deleted an unreachable node and therefore removing any pod from etcd, but an stateful process were still running on the underlying host, we'll be immediately freeing up the pod name from the apiserver. "This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee." To mitigate this it'd be safer to delete the node only after the owned infraMachine is gone. This would give stronger guarantees that the underlying host is destroyed and so there's no chance of a stateful "leaked" process still running. kubernetes-sigs#2565
/milestone v0.3.x |
/lifecycle active |
If there were an edge case where we deleted an unreachable node and therefore removing any pod from etcd, but an stateful process were still running on the underlying host, we'll be immediately freeing up the pod name from the apiserver. "This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee." To mitigate this it'd be safer to delete the node only after the owned infraMachine is gone. This would give stronger guarantees that the underlying host is destroyed and so there's no chance of a stateful "leaked" process still running. kubernetes-sigs#2565
If there were an edge case where we deleted an unreachable node and therefore removing any pod from etcd, but an stateful process were still running on the underlying host, we'll be immediately freeing up the pod name from the apiserver. "This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee." To mitigate this it'd be safer to delete the node only after the owned infraMachine is gone. This would give stronger guarantees that the underlying host is destroyed and so there's no chance of a stateful "leaked" process still running. kubernetes-sigs#2565
If there were an edge case where we deleted an unreachable node and therefore removing any pod from etcd, but an stateful process were still running on the underlying host, we'll be immediately freeing up the pod name from the apiserver. "This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee." To mitigate this it'd be safer to delete the node only after the owned infraMachine is gone. This would give stronger guarantees that the underlying host is destroyed and so there's no chance of a stateful "leaked" process still running. kubernetes-sigs#2565
If there were an edge case where we deleted an unreachable node and therefore removing any pod from etcd, but an stateful process were still running on the underlying host, we'll be immediately freeing up the pod name from the apiserver. "This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee." To mitigate this it'd be safer to delete the node only after the owned infraMachine is gone. This would give stronger guarantees that the underlying host is destroyed and so there's no chance of a stateful "leaked" process still running. kubernetes-sigs#2565
What steps did you take and what happened:
Delete a machine
What did you expect to happen:
Currently when we signal a machine for deletion we respect pod safety by honouring PDB (draining), then we delete the node, then we signal the InfraMachine for deletion.
cluster-api/controllers/machine_controller.go
Lines 275 to 303 in e9038a5
Node deletion is one of possible the mechanisms to force a stateful pod to be deleted.
https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#delete-pods
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md#pod-safety-consistency-guarantees-and-storage-implications
If there were an edge case where we deleted an unreachable node and therefore removing any pod from etcd, but an stateful process were still running on the underlying host, we'll be immediately freeing up the pod name from the apiserver. "This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee."
To mitigate this it'd be safer to delete the node only after the owned infraMachine is gone. This would give stronger guarantees that the underlying host is destroyed and so there's no chance of a stateful "leaked" process still running.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):/kind bug
The text was updated successfully, but these errors were encountered: