You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: enhancements/local-storage/kubesan-csi-driver-integration-into-lvms.md
+27-2Lines changed: 27 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -111,6 +111,7 @@ As a Product Manager:
111
111
- we will scrap this effort for integration and look for alternative solutions if the integration is not possible with reasonable effort.
112
112
- There is a risk that KubeSAN will break easily as its a really young project
113
113
- we will not GA the solution until we have a clear understanding of the stability of the KubeSAN project. The solution will stay in TechPreview until then.
114
+
- we will use community health as a gatekeeper for GA. We will need solid upstream support for the project and that is only possible through a community.
114
115
115
116
## Proposal
116
117
@@ -303,6 +304,28 @@ The status reporting will include:
303
304
- Ensure that any significant events (such as failovers, recoveries, and maintenance actions) are logged and reported.
304
305
305
306
307
+
#### Design Details for Rollback Routines
308
+
309
+
In case of a failure during the provisioning of shared storage, the operator should be able to roll back the changes and clean up the shared VGs without data loss. At the same time, existing PVCs and PVs should be unaffected by the rollback as much as possible.
310
+
311
+
Luckily, due to the CSI design, the PVCs and PVs are not directly affected by the driver for operation, meaning that once the mount procedure has been completed,
312
+
no further action is required from the driver to keep the PVCs and PVs operational. This means that the rollback can be done without affecting the PVCs and PVs.
313
+
314
+
However, if for any reason the shared VGs need to be cleaned up, the operator should be able to do so without affecting the existing PVCs and PVs. This can be achieved by the following steps:
315
+
316
+
1.**Rollback Procedure**:
317
+
- The operator can remove the shared VGs from the LVMCluster CR to ensure that they are not recreated. This is fundamentally the same as removing a deviceClass
318
+
from the CR currently via TopoLVM, however the daemonset running on the node may bail out due to potential data loss incurred via force removal.
319
+
In this case, the manual rollback can be performed.
320
+
- The operator can then delete the KubeSAN CSI driver deployment to prevent any further provisioning of shared storage.
321
+
2.**Manual Rollback Procedure with a broken shared volume group**:
322
+
- The node administrator can manually delete the shared VGs using the `vgremove` command on each node. In case of lock contention due to the failure state,
323
+
it is possible to forcefully circumvent the lock by using the `--force` and ` --ignorelockingfailure` flag on the `vgremove` command.
324
+
This allows nodes that no longer achieve quorum through sanlock to recover. It is possible to do this for every node and to restart a procedure from scratch.
325
+
- The operator can then delete the shared VGs from the LVMCluster CR to ensure that they are not recreated.
326
+
- The operator can then delete the KubeSAN CSI driver deployment to prevent any further provisioning of shared storage.
327
+
328
+
306
329
### Drawbacks
307
330
308
331
- Increased complexity in managing both node-local and shared storage.
@@ -371,6 +394,8 @@ LVMS can be installed on standalone clusters, but the shared storage provisionin
371
394
- Positive feedback from initial users.
372
395
- Full documentation, including troubleshooting guides and best practices.
373
396
- Full LVMS Support Lifecycle
397
+
- Healthy community around the KubeSAN project, upstream contributions are seamless similar to TopoLVM
398
+
- We have a contribution model in place for the KubeSAN project, and the project is in a healthy state with regular releases and active maintainers.
374
399
375
400
### Removing a deprecated feature
376
401
@@ -385,9 +410,9 @@ N/A
385
410
- New deviceClasses with the shared policy should be able to be added to existing LVMClusters without affecting existing deviceClasses.
386
411
387
412
-**Downgrade**:
388
-
- Allow safe downgrades by maintaining backward compatibility. Downgrading from a kubesan enabled version to a purely topolvm enabled version should be a no-break operation for the topolvm part. For the kubesan part, the operator should ensure that the shared VGs can be cleaned up manually
413
+
- Allow safe downgrades by maintaining backward compatibility. Downgrading from a kubesan enabled version to a purely topolvm enabled version should be a no-break operation for the topolvm part. For the kubesan part, the operator should ensure that the shared VGs can be cleaned up manually in case of failure.
389
414
- Provide rollback mechanisms and detailed instructions to revert to previous versions. Ensure that downgrades do not result in data loss or service interruptions.
390
-
The operator should ensure that the shared VGs can be cleaned up manually.
415
+
The operator should ensure that the shared VGs can be cleaned up manually. (more details on rollback routines are in the design details section)
391
416
- Ensure that downgrades do not result in data loss or service interruptions. The operator should ensure that the shared VGs can be cleaned up without data loss on other device classes.
0 commit comments