CAPi doesnt wait for CSI volume unbinding #4707

MaxRink · 2021-05-31T16:19:04Z

What steps did you take and what happened:
Ive rolled my workers which had volumes provisioned by the vsphere CSI attached.
Some of those volumes did not detach properly as CAPI was too fast and removed the nodes before the csi controller could fully detach the volumes.

What did you expect to happen:
CAPI waits until volumes are detached.

Anything else you would like to add:
We had a small discussion in the CAPV slack ( https://kubernetes.slack.com/archives/CKFGK3SSD/p1622198292045000 )
@jzhoucliqr has already an patch ( spectrocloud@c340e68 )

Environment:

Cluster-api version: alpha3

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

The text was updated successfully, but these errors were encountered:

enxebre · 2021-05-31T17:29:15Z

@MaxRink what does signal the csi controller to detach the volumes? Should the vSphereMachine controller ensure detach happens gracefully orthogonally to the csi controller?

MaxRink · 2021-05-31T17:34:58Z

The pod eviction, after the pods are stopped the csi controller will detach the voumes so they can be rebound on another node.
Right now, as far as capi is concerned, after all pods are terminated the node can be safely deleted, getting in the way of the csi nod deamonset and the controller coordinating the volume detachment

fabriziopandini · 2021-06-01T10:04:40Z

I'm not sure this is really a CAPI bug, given that CAPI is not aware of the type of storage providers in use in each cluster.
Nevertheless, if I got the problem right, machine lifecycle hooks could be a valid solution here; they provide an extension point each provider/each User can exploit to check additional conditions before allowing node deletion, like in this case CSI volumes cleanup.
cc @yastij @gab-satchi

yastij · 2021-06-01T12:27:18Z

generally speaking I think CAPI should ensure that volumes are properly detached before deleting the machines. That can be done regardless of the storage provider e.g. @jzhoucliqr.

@randomvariable - IIRC CAPA do not have this problem as non-root volumes are detached and preserved on instance termination right ?

MaxRink · 2021-06-01T13:30:12Z

Its not only the vsphere CSI which might have that issue btw.
Other providers like NetApp Trident, Pure PSO and other iSCSI based provisionbers also rely on properly detaching before node deletion.

vincepri · 2021-06-01T20:37:57Z

@yastij Are you thinking about inspecting a node and relative Volumes' status?

vincepri · 2021-07-06T17:55:33Z

/lifecycle awaiting-more-evidence
/milestone Next

vincepri · 2021-07-28T23:50:42Z

/milestone v0.4

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 31, 2021

k8s-ci-robot added this to the Next milestone Jul 6, 2021

jzhoucliqr mentioned this issue Jul 14, 2021

🐛 Wait for volume detach after node drain #4945

Merged

k8s-ci-robot modified the milestones: Next, v0.4 Jul 28, 2021

k8s-ci-robot closed this as completed in #4945 Aug 2, 2021

enxebre mentioned this issue Oct 11, 2021

Add Machine Deletion Hooks Proposal openshift/enhancements#862

Merged

enxebre mentioned this issue May 31, 2024

🐛 Machine deletion skips waiting for volumes detached for unreachable Nodes #10662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAPi doesnt wait for CSI volume unbinding #4707

CAPi doesnt wait for CSI volume unbinding #4707

MaxRink commented May 31, 2021

enxebre commented May 31, 2021 •

edited

Loading

MaxRink commented May 31, 2021

fabriziopandini commented Jun 1, 2021 •

edited

Loading

yastij commented Jun 1, 2021

MaxRink commented Jun 1, 2021

vincepri commented Jun 1, 2021 •

edited

Loading

vincepri commented Jul 6, 2021

vincepri commented Jul 28, 2021

CAPi doesnt wait for CSI volume unbinding #4707

CAPi doesnt wait for CSI volume unbinding #4707

Comments

MaxRink commented May 31, 2021

enxebre commented May 31, 2021 • edited Loading

MaxRink commented May 31, 2021

fabriziopandini commented Jun 1, 2021 • edited Loading

yastij commented Jun 1, 2021

MaxRink commented Jun 1, 2021

vincepri commented Jun 1, 2021 • edited Loading

vincepri commented Jul 6, 2021

vincepri commented Jul 28, 2021

enxebre commented May 31, 2021 •

edited

Loading

fabriziopandini commented Jun 1, 2021 •

edited

Loading

vincepri commented Jun 1, 2021 •

edited

Loading