@@ -29,7 +29,7 @@ status: implementable
29
29
- [ ` MachineDrainRule ` CRD] ( #machinedrainrule-crd )
30
30
- [ Example: Exclude Pods from drain] ( #example-exclude-pods-from-drain )
31
31
- [ Example: Drain order] ( #example-drain-order )
32
- - [ Drain annotations ] ( #drain-annotations )
32
+ - [ Drain labels ] ( #drain-labels )
33
33
- [ Node drain behavior] ( #node-drain-behavior )
34
34
- [ Changes to wait for volume detach] ( #changes-to-wait-for-volume-detach )
35
35
- [ Security Model] ( #security-model )
@@ -53,7 +53,7 @@ hard-coded rules to decide which Pods should be evicted. This implementation is
53
53
for more details).
54
54
55
55
With recent changes in Cluster API, we can now have finer control on the drain process, and thus we propose a new
56
- ` MachineDrainRule ` CRD to make the drain rules configurable per Pod. Additionally, we're proposing annotations that
56
+ ` MachineDrainRule ` CRD to make the drain rules configurable per Pod. Additionally, we're proposing labels that
57
57
workload cluster admins can add to individual Pods to control their drain behavior.
58
58
59
59
This would be a huge improvement over the “standard” ` kubectl drain ` aligned implementation we have today and help to
@@ -109,7 +109,8 @@ to be able to change the drain configuration without having to modify all Machin
109
109
110
110
### Non-Goals
111
111
112
- * Change the drain behavior for DaemonSet and static Pods (we’ll continue to skip draining for both)
112
+ * Change the drain behavior for DaemonSet and static Pods (we’ll continue to skip draining for both).
113
+ While the drain behavior itself won't be changed, we will stop waiting for detachment of volumes of DaemonSet Pods.
113
114
114
115
### Future work
115
116
@@ -222,14 +223,18 @@ spec:
222
223
app: portworx
223
224
` ` `
224
225
225
- # ### Drain annotations
226
+ # ### Drain labels
226
227
227
- We propose to introduce new annotations to allow workload cluster admins/users to define drain behavior
228
- for Pods. These annotations would be either added directly to Pods or indirectly via Deployments, StatefulSets, etc.
229
- The annotations will take precedence over `MachineDrainRules` specified in the management cluster.
228
+ We propose to introduce new labels to allow workload cluster admins/users to define drain behavior
229
+ for Pods. These labels would be either added directly to Pods or indirectly via Deployments, StatefulSets, etc.
230
+ The labels will take precedence over `MachineDrainRules` specified in the management cluster.
230
231
231
232
* `cluster.x-k8s.io/drain: skip`
232
- * `cluster.x-k8s.io/drain-order: <order>`
233
+
234
+ Initially we also considered adding a `cluster.x-k8s.io/drain-order` label. But we're not entirely sure about it
235
+ yet as someone who has access to the workload cluster (or maybe only specific Pods) would be able to influence the
236
+ drain order of the entire cluster, which might lead to problems. The skip label is safe in comparison because it
237
+ only influences the drain behavior of the Pod that has the label.
233
238
234
239
# ### Node drain behavior
235
240
@@ -241,10 +246,8 @@ The following describes the new algorithm that decides which Pods should be drai
241
246
* \=\> use `behavior: Skip`
242
247
* If the Pod has `cluster.x-k8s.io/drain: skip`
243
248
* \=\> use `behavior: Skip`
244
- * if the Pod has `cluster.x-k8s.io/drain-order: <order>`
245
- * \=\> use `behavior: Drain` and `order: <order>`
246
249
* If there is a matching `MachineDrainRule`
247
- * \=\> use `behavior` and `order` from the first matching `MachineDrainRule`
250
+ * \=\> use `behavior` and `order` from the first matching `MachineDrainRule` (based on alphabetical order)
248
251
* Otherwise:
249
252
* \=\> use `behavior: Drain` and `order: 0`
250
253
* If there are no more Pods to be drained
@@ -271,6 +274,11 @@ Notes:
271
274
Today, after Node drain we are waiting for **all** volumes to be detached. We are going to change that behavior to ignore
272
275
all attached volumes that belong to Pods for which we skipped the drain.
273
276
277
+ Please note, today the only Pods for which we skip drain that can have volumes are DaemonSet Pods. If a DaemonSet Pod
278
+ has a volume currently wait for volume detach would block indefinitely. The only way around this today is to set either
279
+ the `Machine.spec.nodeVolumeDetachTimeout` field or the `machine.cluster.x-k8s.io/exclude-wait-for-node-volume-detach`
280
+ annotation. With this change we will stop waiting for volumes of DaemonSet Pods to be detached.
281
+
274
282
# ## Security Model
275
283
276
284
This proposal will add a new `MachineDrainRule` CRD. The Cluster API core controller needs permissions to read this CRD.
@@ -287,32 +295,35 @@ Cluster CRs), it is also possible to further restrict access.
287
295
288
296
**Add Node drain configuration to the Machine object**
289
297
290
- We considered adding the drain rules directly to the Machine objects instead. We discarded this option because it
291
- would have made it necessary to configure Node drain on every single Machine. By having a separate CRD it is now
292
- possible to configure the Node drain for all Clusters / Machines or a specific subset of Machines at once. This
293
- also means that the Node drain configuration can be immediately changed without having to propagate configuration
294
- changes to all Machines.
298
+ We considered adding the drain rules directly to the Machine objects (and thus also to the MachineDeployment & MachineSet objects)
299
+ instead. We discarded this option because it would have made it necessary to configure Node drain on every single Machine.
300
+ By having a separate CRD it is now possible to configure the Node drain for all Clusters / Machines or a specific subset
301
+ of Machines at once. This also means that the Node drain configuration can be immediately changed without having to propagate
302
+ configuration changes to all Machines.
295
303
296
304
# # Upgrade Strategy
297
305
298
- No upgrade considerations apply. `MachineDrainRules` are orthogonal to the state of the Cluster / Machines as they
299
- only configure how the Machine controller should drain Nodes. Accordingly, they are not part of the Machine specs.
300
- Thus, as soon as the new Cluster API version that supports this feature is deployed, `MachineDrainRules` can be immediately
301
- used without rolling out / re-configuring any Clusters / Machines.
306
+ ` MachineDrainRules` are orthogonal to the state of the Cluster / Machines as they only configure how the Machine controller
307
+ should drain Nodes. Accordingly, they are not part of the Machine specs. Thus, as soon as the new Cluster API version that
308
+ supports this feature is deployed, `MachineDrainRules` can be immediately used without rolling out / re-configuring any
309
+ Clusters / Machines.
310
+
311
+ Please note that while the drain behavior of DaemonSet Pods won't change (we’ll continue to skip draining),
312
+ we will stop waiting for detachment of volumes of DaemonSet Pods.
302
313
303
314
# # Additional Details
304
315
305
316
# ## Test Plan
306
317
307
318
* Extensive unit test coverage for the Node drain code for all supported cases
308
- * Extend the Node drain e2e test to cover draining Pods using various `MachineDrainRules` and the annotations
319
+ * Extend the Node drain e2e test to cover draining Pods using various `MachineDrainRules` and the labels
309
320
(including validation of condition messages).
310
321
311
322
# ## Graduation Criteria
312
323
313
324
The `MachineDrainRules` CRD will be added as `v1beta1` CRD to the `cluster.x-k8s.io` apiGroup.
314
325
An additional feature gate is not required as the behavior of the Machine controller will stay the same if neither
315
- ` MachineDrainRule` CRDs nor the annotations are used.
326
+ ` MachineDrainRule` CRDs nor the labels are used.
316
327
317
328
# ## Version Skew Strategy
318
329
0 commit comments