Skip to content

Commit 511ddd2

Browse files
committed
Address findings
1 parent 1fb94e2 commit 511ddd2

File tree

1 file changed

+33
-22
lines changed

1 file changed

+33
-22
lines changed

docs/proposals/20240930-machine-drain-rules.md

Lines changed: 33 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ status: implementable
2929
- [`MachineDrainRule` CRD](#machinedrainrule-crd)
3030
- [Example: Exclude Pods from drain](#example-exclude-pods-from-drain)
3131
- [Example: Drain order](#example-drain-order)
32-
- [Drain annotations](#drain-annotations)
32+
- [Drain labels](#drain-labels)
3333
- [Node drain behavior](#node-drain-behavior)
3434
- [Changes to wait for volume detach](#changes-to-wait-for-volume-detach)
3535
- [Security Model](#security-model)
@@ -53,7 +53,7 @@ hard-coded rules to decide which Pods should be evicted. This implementation is
5353
for more details).
5454

5555
With recent changes in Cluster API, we can now have finer control on the drain process, and thus we propose a new
56-
`MachineDrainRule` CRD to make the drain rules configurable per Pod. Additionally, we're proposing annotations that
56+
`MachineDrainRule` CRD to make the drain rules configurable per Pod. Additionally, we're proposing labels that
5757
workload cluster admins can add to individual Pods to control their drain behavior.
5858

5959
This would be a huge improvement over the “standard” `kubectl drain` aligned implementation we have today and help to
@@ -109,7 +109,8 @@ to be able to change the drain configuration without having to modify all Machin
109109

110110
### Non-Goals
111111

112-
* Change the drain behavior for DaemonSet and static Pods (we’ll continue to skip draining for both)
112+
* Change the drain behavior for DaemonSet and static Pods (we’ll continue to skip draining for both).
113+
While the drain behavior itself won't be changed, we will stop waiting for detachment of volumes of DaemonSet Pods.
113114

114115
### Future work
115116

@@ -222,14 +223,18 @@ spec:
222223
app: portworx
223224
```
224225

225-
#### Drain annotations
226+
#### Drain labels
226227

227-
We propose to introduce new annotations to allow workload cluster admins/users to define drain behavior
228-
for Pods. These annotations would be either added directly to Pods or indirectly via Deployments, StatefulSets, etc.
229-
The annotations will take precedence over `MachineDrainRules` specified in the management cluster.
228+
We propose to introduce new labels to allow workload cluster admins/users to define drain behavior
229+
for Pods. These labels would be either added directly to Pods or indirectly via Deployments, StatefulSets, etc.
230+
The labels will take precedence over `MachineDrainRules` specified in the management cluster.
230231

231232
* `cluster.x-k8s.io/drain: skip`
232-
* `cluster.x-k8s.io/drain-order: <order>`
233+
234+
Initially we also considered adding a `cluster.x-k8s.io/drain-order` label. But we're not entirely sure about it
235+
yet as someone who has access to the workload cluster (or maybe only specific Pods) would be able to influence the
236+
drain order of the entire cluster, which might lead to problems. The skip label is safe in comparison because it
237+
only influences the drain behavior of the Pod that has the label.
233238

234239
#### Node drain behavior
235240

@@ -241,10 +246,8 @@ The following describes the new algorithm that decides which Pods should be drai
241246
* \=\> use `behavior: Skip`
242247
* If the Pod has `cluster.x-k8s.io/drain: skip`
243248
* \=\> use `behavior: Skip`
244-
* if the Pod has `cluster.x-k8s.io/drain-order: <order>`
245-
* \=\> use `behavior: Drain` and `order: <order>`
246249
* If there is a matching `MachineDrainRule`
247-
* \=\> use `behavior` and `order` from the first matching `MachineDrainRule`
250+
* \=\> use `behavior` and `order` from the first matching `MachineDrainRule` (based on alphabetical order)
248251
* Otherwise:
249252
* \=\> use `behavior: Drain` and `order: 0`
250253
* If there are no more Pods to be drained
@@ -271,6 +274,11 @@ Notes:
271274
Today, after Node drain we are waiting for **all** volumes to be detached. We are going to change that behavior to ignore
272275
all attached volumes that belong to Pods for which we skipped the drain.
273276

277+
Please note, today the only Pods for which we skip drain that can have volumes are DaemonSet Pods. If a DaemonSet Pod
278+
has a volume currently wait for volume detach would block indefinitely. The only way around this today is to set either
279+
the `Machine.spec.nodeVolumeDetachTimeout` field or the `machine.cluster.x-k8s.io/exclude-wait-for-node-volume-detach`
280+
annotation. With this change we will stop waiting for volumes of DaemonSet Pods to be detached.
281+
274282
### Security Model
275283

276284
This proposal will add a new `MachineDrainRule` CRD. The Cluster API core controller needs permissions to read this CRD.
@@ -287,32 +295,35 @@ Cluster CRs), it is also possible to further restrict access.
287295

288296
**Add Node drain configuration to the Machine object**
289297

290-
We considered adding the drain rules directly to the Machine objects instead. We discarded this option because it
291-
would have made it necessary to configure Node drain on every single Machine. By having a separate CRD it is now
292-
possible to configure the Node drain for all Clusters / Machines or a specific subset of Machines at once. This
293-
also means that the Node drain configuration can be immediately changed without having to propagate configuration
294-
changes to all Machines.
298+
We considered adding the drain rules directly to the Machine objects (and thus also to the MachineDeployment & MachineSet objects)
299+
instead. We discarded this option because it would have made it necessary to configure Node drain on every single Machine.
300+
By having a separate CRD it is now possible to configure the Node drain for all Clusters / Machines or a specific subset
301+
of Machines at once. This also means that the Node drain configuration can be immediately changed without having to propagate
302+
configuration changes to all Machines.
295303

296304
## Upgrade Strategy
297305

298-
No upgrade considerations apply. `MachineDrainRules` are orthogonal to the state of the Cluster / Machines as they
299-
only configure how the Machine controller should drain Nodes. Accordingly, they are not part of the Machine specs.
300-
Thus, as soon as the new Cluster API version that supports this feature is deployed, `MachineDrainRules` can be immediately
301-
used without rolling out / re-configuring any Clusters / Machines.
306+
`MachineDrainRules` are orthogonal to the state of the Cluster / Machines as they only configure how the Machine controller
307+
should drain Nodes. Accordingly, they are not part of the Machine specs. Thus, as soon as the new Cluster API version that
308+
supports this feature is deployed, `MachineDrainRules` can be immediately used without rolling out / re-configuring any
309+
Clusters / Machines.
310+
311+
Please note that while the drain behavior of DaemonSet Pods won't change (we’ll continue to skip draining),
312+
we will stop waiting for detachment of volumes of DaemonSet Pods.
302313

303314
## Additional Details
304315

305316
### Test Plan
306317

307318
* Extensive unit test coverage for the Node drain code for all supported cases
308-
* Extend the Node drain e2e test to cover draining Pods using various `MachineDrainRules` and the annotations
319+
* Extend the Node drain e2e test to cover draining Pods using various `MachineDrainRules` and the labels
309320
(including validation of condition messages).
310321

311322
### Graduation Criteria
312323

313324
The `MachineDrainRules` CRD will be added as `v1beta1` CRD to the `cluster.x-k8s.io` apiGroup.
314325
An additional feature gate is not required as the behavior of the Machine controller will stay the same if neither
315-
`MachineDrainRule` CRDs nor the annotations are used.
326+
`MachineDrainRule` CRDs nor the labels are used.
316327

317328
### Version Skew Strategy
318329

0 commit comments

Comments
 (0)