Skip to content

Commit 9a9ca2c

Browse files
authored
Merge pull request #91241 from mburke5678/nodes-infra-workload-taint-fix-412
[enterprise-4.12] OCPBUGS34817: Infrastructure node workloads 'taints' are ambiguous in example and description
2 parents c63a0b2 + 34e6817 commit 9a9ca2c

6 files changed

+83
-97
lines changed

machine_management/creating-infrastructure-machinesets.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,11 @@ Some of the infrastructure resources are deployed in your cluster by default. Yo
107107

108108
[source,yaml]
109109
----
110+
apiVersion: imageregistry.operator.openshift.io/v1
111+
kind: Config
112+
metadata:
113+
name: cluster
114+
# ...
110115
spec:
111116
nodePlacement: <1>
112117
nodeSelector:

modules/binding-infra-node-workloads-using-taints-tolerations.adoc

Lines changed: 35 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@
77
[id="binding-infra-node-workloads-using-taints-tolerations_{context}"]
88
= Binding infrastructure node workloads using taints and tolerations
99

10-
If you have an infra node that has the `infra` and `worker` roles assigned, you must configure the node so that user workloads are not assigned to it.
10+
If you have an infrastructure node that has the `infra` and `worker` roles assigned, you must configure the node so that user workloads are not assigned to it.
1111

1212
[IMPORTANT]
1313
====
14-
It is recommended that you preserve the dual `infra,worker` label that is created for infra nodes and use taints and tolerations to manage nodes that user workloads are scheduled on. If you remove the `worker` label from the node, you must create a custom pool to manage it. A node with a label other than `master` or `worker` is not recognized by the MCO without a custom pool. Maintaining the `worker` label allows the node to be managed by the default worker machine config pool, if no custom pools that select the custom label exists. The `infra` label communicates to the cluster that it does not count toward the total number of subscriptions.
14+
It is recommended that you preserve the dual `infra,worker` label that is created for infrastructure nodes and use taints and tolerations to manage nodes that user workloads are scheduled on. If you remove the `worker` label from the node, you must create a custom pool to manage it. A node with a label other than `master` or `worker` is not recognized by the MCO without a custom pool. Maintaining the `worker` label allows the node to be managed by the default worker machine config pool, if no custom pools that select the custom label exists. The `infra` label communicates to the cluster that it does not count toward the total number of subscriptions.
1515
====
1616

1717
.Prerequisites
@@ -20,7 +20,7 @@ It is recommended that you preserve the dual `infra,worker` label that is create
2020
2121
.Procedure
2222

23-
. Add a taint to the infra node to prevent scheduling user workloads on it:
23+
. Add a taint to the infrastructure node to prevent scheduling user workloads on it:
2424

2525
.. Determine if the node has the taint:
2626
+
@@ -36,7 +36,7 @@ oc describe node ci-ln-iyhx092-f76d1-nvdfm-worker-b-wln2l
3636
Name: ci-ln-iyhx092-f76d1-nvdfm-worker-b-wln2l
3737
Roles: worker
3838
...
39-
Taints: node-role.kubernetes.io/infra:NoSchedule
39+
Taints: node-role.kubernetes.io/infra=reserved:NoSchedule
4040
...
4141
----
4242
+
@@ -53,57 +53,66 @@ For example:
5353
+
5454
[source,terminal]
5555
----
56-
$ oc adm taint nodes node1 node-role.kubernetes.io/infra=reserved:NoExecute
56+
$ oc adm taint nodes node1 node-role.kubernetes.io/infra=reserved:NoSchedule
5757
----
5858
+
5959
[TIP]
6060
====
61-
You can alternatively apply the following YAML to add the taint:
61+
You can alternatively edit the pod specification to add the taint:
6262
6363
[source,yaml]
6464
----
65-
kind: Node
6665
apiVersion: v1
66+
kind: Node
6767
metadata:
68-
name: <node_name>
69-
labels:
70-
...
68+
name: node1
69+
# ...
7170
spec:
7271
taints:
7372
- key: node-role.kubernetes.io/infra
74-
effect: NoExecute
7573
value: reserved
76-
...
74+
effect: NoSchedule
75+
# ...
7776
----
7877
====
7978
+
80-
This example places a taint on `node1` that has key `node-role.kubernetes.io/infra` and taint effect `NoSchedule`. Nodes with the `NoSchedule` effect schedule only pods that tolerate the taint, but allow existing pods to remain scheduled on the node.
79+
These examples place a taint on `node1` that has the `node-role.kubernetes.io/infra` key and the `NoSchedule` taint effect. Nodes with the `NoSchedule` effect schedule only pods that tolerate the taint, but allow existing pods to remain scheduled on the node.
8180
+
8281
[NOTE]
8382
====
8483
If a descheduler is used, pods violating node taints could be evicted from the cluster.
8584
====
8685

87-
. Add tolerations for the pod configurations you want to schedule on the infra node, like router, registry, and monitoring workloads. Add the following code to the `Pod` object specification:
86+
. Add tolerations to the pods that you want to schedule on the infrastructure node, such as the router, registry, and monitoring workloads. Referencing the previous examples, add the following tolerations to the `Pod` object specification:
8887
+
8988
[source,yaml]
9089
----
91-
tolerations:
92-
- effect: NoExecute <1>
93-
key: node-role.kubernetes.io/infra <2>
94-
operator: Equal <3>
95-
value: reserved <4>
90+
apiVersion: v1
91+
kind: Pod
92+
metadata:
93+
annotations:
94+
95+
# ...
96+
spec:
97+
# ...
98+
tolerations:
99+
- key: node-role.kubernetes.io/infra <1>
100+
value: reserved <2>
101+
effect: NoSchedule <3>
102+
operator: Equal <4>
96103
----
97-
<1> Specify the effect that you added to the node.
98-
<2> Specify the key that you added to the node.
99-
<3> Specify the `Equal` Operator to require a taint with the key `node-role.kubernetes.io/infra` to be present on the node.
100-
<4> Specify the value of the key-value pair taint that you added to the node.
104+
<1> Specify the key that you added to the node.
105+
<2> Specify the value of the key-value pair taint that you added to the node.
106+
<3> Specify the effect that you added to the node.
107+
<4> Specify the `Equal` Operator to require a taint with the key `node-role.kubernetes.io/infra` to be present on the node.
101108
+
102-
This toleration matches the taint created by the `oc adm taint` command. A pod with this toleration can be scheduled onto the infra node.
109+
This toleration matches the taint created by the `oc adm taint` command. A pod with this toleration can be scheduled onto the infrastructure node.
103110
+
104111
[NOTE]
105112
====
106-
Moving pods for an Operator installed via OLM to an infra node is not always possible. The capability to move Operator pods depends on the configuration of each Operator.
113+
Moving pods for an Operator installed via OLM to an infrastructure node is not always possible. The capability to move Operator pods depends on the configuration of each Operator.
107114
====
108115

109-
. Schedule the pod to the infra node using a scheduler. See the documentation for _Controlling pod placement onto nodes_ for details.
116+
. Schedule the pod to the infrastructure node by using a scheduler. See the documentation for "Controlling pod placement using the scheduler" for details.
117+
118+
. Remove any workloads that you do not want, or that do not belong, on the new infrastructure node. See the list of workloads supported for use on infrastructure nodes in "{product-title} infrastructure components".

modules/creating-an-infra-node.adoc

Lines changed: 15 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,20 @@
1313
See Creating infrastructure machine sets for installer-provisioned infrastructure environments or for any cluster where the control plane nodes are managed by the machine API.
1414
====
1515

16-
Requirements of the cluster dictate that infrastructure, also called `infra` nodes, be provisioned. The installer only provides provisions for control plane and worker nodes. Worker nodes can be designated as infrastructure nodes or application, also called `app`, nodes through labeling.
16+
Requirements of the cluster dictate that infrastructure (infra) nodes, be provisioned. The installation program provisions only control plane and worker nodes. Worker nodes can be designated as infrastructure nodes through labeling. You can then use taints and tolerations to move appropriate workloads to the infrastructure nodes. For more information, see "Moving resources to infrastructure machine sets".
1717

18-
.Procedure
18+
You can optionally create a default cluster-wide node selector. The default node selector is applied to pods created in all namespaces and creates an intersection with any existing node selectors on a pod, which additionally constrains the pod's selector.
1919

20-
. Add a label to the worker node that you want to act as application node:
21-
+
22-
[source,terminal]
23-
----
24-
$ oc label node <node-name> node-role.kubernetes.io/app=""
25-
----
20+
[IMPORTANT]
21+
====
22+
If the default node selector key conflicts with the key of a pod's label, then the default node selector is not applied.
23+
24+
However, do not set a default node selector that might cause a pod to become unschedulable. For example, setting the default node selector to a specific node role, such as `node-role.kubernetes.io/infra=""`, when a pod's label is set to a different node role, such as `node-role.kubernetes.io/master=""`, can cause the pod to become unschedulable. For this reason, use caution when setting the default node selector to specific node roles.
25+
26+
You can alternatively use a project node selector to avoid cluster-wide node selector key conflicts.
27+
====
28+
29+
.Procedure
2630

2731
. Add a label to the worker nodes that you want to act as infrastructure nodes:
2832
+
@@ -31,23 +35,14 @@ $ oc label node <node-name> node-role.kubernetes.io/app=""
3135
$ oc label node <node-name> node-role.kubernetes.io/infra=""
3236
----
3337

34-
. Check to see if applicable nodes now have the `infra` role and `app` roles:
38+
. Check to see if applicable nodes now have the `infra` role:
3539
+
3640
[source,terminal]
3741
----
3842
$ oc get nodes
3943
----
4044

41-
. Create a default cluster-wide node selector. The default node selector is applied to pods created in all namespaces. This creates an intersection with any existing node selectors on a pod, which additionally constrains the pod's selector.
42-
+
43-
[IMPORTANT]
44-
====
45-
If the default node selector key conflicts with the key of a pod's label, then the default node selector is not applied.
46-
47-
However, do not set a default node selector that might cause a pod to become unschedulable. For example, setting the default node selector to a specific node role, such as `node-role.kubernetes.io/infra=""`, when a pod's label is set to a different node role, such as `node-role.kubernetes.io/master=""`, can cause the pod to become unschedulable. For this reason, use caution when setting the default node selector to specific node roles.
48-
49-
You can alternatively use a project node selector to avoid cluster-wide node selector key conflicts.
50-
====
45+
. Optional: Create a default cluster-wide node selector:
5146

5247
.. Edit the `Scheduler` object:
5348
+
@@ -72,4 +67,4 @@ spec:
7267

7368
.. Save the file to apply the changes.
7469

75-
You can now move infrastructure resources to the newly labeled `infra` nodes.
70+
You can now move infrastructure resources to the new infrastructure nodes. Also, remove any workloads that you do not want, or that do not belong, on the new infrastructure node. See the list of workloads supported for use on infrastructure nodes in "{product-title} infrastructure components".

modules/infrastructure-moving-monitoring.adoc

Lines changed: 1 addition & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -40,81 +40,57 @@ data:
4040
- key: node-role.kubernetes.io/infra
4141
value: reserved
4242
effect: NoSchedule
43-
- key: node-role.kubernetes.io/infra
44-
value: reserved
45-
effect: NoExecute
4643
prometheusK8s:
4744
nodeSelector:
4845
node-role.kubernetes.io/infra: ""
4946
tolerations:
5047
- key: node-role.kubernetes.io/infra
5148
value: reserved
5249
effect: NoSchedule
53-
- key: node-role.kubernetes.io/infra
54-
value: reserved
55-
effect: NoExecute
5650
prometheusOperator:
5751
nodeSelector:
5852
node-role.kubernetes.io/infra: ""
5953
tolerations:
6054
- key: node-role.kubernetes.io/infra
6155
value: reserved
6256
effect: NoSchedule
63-
- key: node-role.kubernetes.io/infra
64-
value: reserved
65-
effect: NoExecute
6657
k8sPrometheusAdapter:
6758
nodeSelector:
6859
node-role.kubernetes.io/infra: ""
6960
tolerations:
7061
- key: node-role.kubernetes.io/infra
7162
value: reserved
7263
effect: NoSchedule
73-
- key: node-role.kubernetes.io/infra
74-
value: reserved
75-
effect: NoExecute
7664
kubeStateMetrics:
7765
nodeSelector:
7866
node-role.kubernetes.io/infra: ""
7967
tolerations:
8068
- key: node-role.kubernetes.io/infra
8169
value: reserved
8270
effect: NoSchedule
83-
- key: node-role.kubernetes.io/infra
84-
value: reserved
85-
effect: NoExecute
8671
telemeterClient:
8772
nodeSelector:
8873
node-role.kubernetes.io/infra: ""
8974
tolerations:
9075
- key: node-role.kubernetes.io/infra
9176
value: reserved
9277
effect: NoSchedule
93-
- key: node-role.kubernetes.io/infra
94-
value: reserved
95-
effect: NoExecute
9678
openshiftStateMetrics:
9779
nodeSelector:
9880
node-role.kubernetes.io/infra: ""
9981
tolerations:
10082
- key: node-role.kubernetes.io/infra
10183
value: reserved
10284
effect: NoSchedule
103-
- key: node-role.kubernetes.io/infra
104-
value: reserved
105-
effect: NoExecute
10685
thanosQuerier:
10786
nodeSelector:
10887
node-role.kubernetes.io/infra: ""
10988
tolerations:
11089
- key: node-role.kubernetes.io/infra
11190
value: reserved
11291
effect: NoSchedule
113-
- key: node-role.kubernetes.io/infra
114-
value: reserved
115-
effect: NoExecute
11692
----
117-
<1> Add a `nodeSelector` parameter with the appropriate value to the component you want to move. You can use a `nodeSelector` in the format shown or use `<key>: <value>` pairs, based on the value specified for the node. If you added a taint to the infrasructure node, also add a matching toleration.
93+
<1> Add a `nodeSelector` parameter with the appropriate value to the component you want to move. You can use a `nodeSelector` parameter in the format shown or use `<key>: <value>` pairs, based on the value specified for the node. If you added a taint to the infrastructure node, also add a matching toleration.
11894

11995
. Watch the monitoring pods move to the new machines:
12096
+

modules/infrastructure-moving-registry.adoc

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -61,15 +61,12 @@ $ oc edit configs.imageregistry.operator.openshift.io/cluster
6161
+
6262
[source,yaml]
6363
----
64+
apiVersion: imageregistry.operator.openshift.io/v1
65+
kind: Config
66+
metadata:
67+
name: cluster
68+
# ...
6469
spec:
65-
affinity:
66-
podAntiAffinity:
67-
preferredDuringSchedulingIgnoredDuringExecution:
68-
- podAffinityTerm:
69-
namespaces:
70-
- openshift-image-registry
71-
topologyKey: kubernetes.io/hostname
72-
weight: 100
7370
logLevel: Normal
7471
managementState: Managed
7572
nodeSelector: <1>
@@ -78,11 +75,8 @@ spec:
7875
- effect: NoSchedule
7976
key: node-role.kubernetes.io/infra
8077
value: reserved
81-
- effect: NoExecute
82-
key: node-role.kubernetes.io/infra
83-
value: reserved
8478
----
85-
<1> Add a `nodeSelector` parameter with the appropriate value to the component you want to move. You can use a `nodeSelector` in the format shown or use `<key>: <value>` pairs, based on the value specified for the node. If you added a taint to the infrasructure node, also add a matching toleration.
79+
<1> Add a `nodeSelector` parameter with the appropriate value to the component you want to move. You can use a `nodeSelector` parameter in the format shown or use `<key>: <value>` pairs, based on the value specified for the node. If you added a taint to the infrasructure node, also add a matching toleration.
8680

8781
. Verify the registry pod has been moved to the infrastructure node.
8882
+

modules/infrastructure-moving-router.adoc

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -59,20 +59,27 @@ $ oc edit ingresscontroller default -n openshift-ingress-operator
5959
+
6060
[source,yaml]
6161
----
62-
spec:
63-
nodePlacement:
64-
nodeSelector: <1>
65-
matchLabels:
66-
node-role.kubernetes.io/infra: ""
67-
tolerations:
68-
- effect: NoSchedule
69-
key: node-role.kubernetes.io/infra
70-
value: reserved
71-
- effect: NoExecute
72-
key: node-role.kubernetes.io/infra
73-
value: reserved
74-
----
75-
<1> Add a `nodeSelector` parameter with the appropriate value to the component you want to move. You can use a `nodeSelector` in the format shown or use `<key>: <value>` pairs, based on the value specified for the node. If you added a taint to the infrastructure node, also add a matching toleration.
62+
apiVersion: operator.openshift.io/v1
63+
kind: IngressController
64+
metadata:
65+
creationTimestamp: "2025-03-26T21:15:43Z"
66+
finalizers:
67+
- ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
68+
generation: 1
69+
name: default
70+
# ...
71+
spec:
72+
nodePlacement:
73+
nodeSelector: <1>
74+
matchLabels:
75+
node-role.kubernetes.io/infra: ""
76+
tolerations:
77+
- effect: NoSchedule
78+
key: node-role.kubernetes.io/infra
79+
value: reserved
80+
# ...
81+
----
82+
<1> Add a `nodeSelector` parameter with the appropriate value to the component you want to move. You can use a `nodeSelector` parameter in the format shown or use `<key>: <value>` pairs, based on the value specified for the node. If you added a taint to the infrastructure node, also add a matching toleration.
7683

7784
. Confirm that the router pod is running on the `infra` node.
7885
.. View the list of router pods and note the node name of the running pod:

0 commit comments

Comments
 (0)