-
Notifications
You must be signed in to change notification settings - Fork 1.8k
GH34445: Doc on using machine config pool during update 2 #35420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✔️ Deploy Preview for osdocs ready! 🔨 Explore the source changes: fb62b82 🔍 Inspect the deploy log: https://app.netlify.com/sites/osdocs/deploys/6138f35ee3aa5e0007d14b06 😎 Browse the preview: https://deploy-preview-35420--osdocs.netlify.app/openshift-enterprise/latest/updating/update-using-custom-machine-config-pools |
bfb7910
to
618688d
Compare
@LalatenduMohanty How does this look? Don't worry about the wording at this time. I would need to make some editorial changes to comply with the OpenShift docs. |
I liked the way you have added scaring-sounding wording to dissuade casual users in other topics. With respect to calling it |
May be we can call it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left several suggestions, mostly geared at trying to avoid saying rolling update
because all updates are rolling updates even before what we're proposing here.
Added a decent example which leverages both canary and segmenting to fit short windows with relatively look drain and reboot cycles.
Random high level thought before we merge this we should normalize on either |
---- | ||
<1> Specify a name for the machine config pool | ||
<2> Specify a selector to filter keys. This selects all nodes with the `machineconfiguration.openshift.io/role` value equal to `worker` and `mcp-noupdate`. | ||
<3> Specify the custom label you added to the node(s) that you want in this machine config pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason <3> Specify the custom label you added to the node(s) that you want in this machine config pool.
is separately mentioned from <1> Specify a name for the machine config pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the matchLabel
to get the node to move to that MCP, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but in the docs we have asked the user to add the label name same as MCP name. So I do not think we need to change from that narrative. Also it also keeps it simple to follow the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiajliu @jianlinliu Can you help here?
@LalatenduMohanty
I labelled the node, created the MCP without the matchLabels
. I see the role added to the node, the new MCP (rendered-workerpool-canary). The node doesnt appear to have moved to the new MCP:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-l6w1vgb-f76d1-l6qwn-master-0 Ready master 32m v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-master-1 Ready master 32m v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-master-2 Ready master 32m v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-worker-a-c5kg2 Ready worker,workerpool-canary 23m v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-worker-b-kld5s Ready worker 23m v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-worker-c-w7q8t Ready worker 23m v1.22.0-rc.0+249ab87
mburke@mburke ~ $ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-b9f06deaab4e39c6a025531d8ada384a True False False 3 3 3 0 31m
worker rendered-worker-0420fb639ebd3e75849c539298f788d7 True False False 3 3 3 0 31m
workerpool-canary rendered-workerpool-canary-0420fb639ebd3e75849c539298f788d7 True False False 0 0 0 0 69s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I create the MCP with the matchLabels
as in the docs, the node moves:
oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-b9f06deaab4e39c6a025531d8ada384a True False False 3 3 3 0 36m
worker rendered-worker-0420fb639ebd3e75849c539298f788d7 True False False 2 2 2 0 36m
workerpool-canary rendered-workerpool-canary-0420fb639ebd3e75849c539298f788d7 True False False 1 1 1 0 23s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the YAML that didn't seem to work:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: workerpool-canary
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,workerpool-canary]
}
The command to label the node:
oc label node ci-ln-l6w1vgb-f76d1-l6qwn-worker-a-c5kg2 node-role.kubernetes.io/workerpool-canary=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I then added the nodeSelector stanza and it seemed to work:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: workerpool-canary
spec:
machineConfigSelector:
matchExpressions:
- {
key: machineconfiguration.openshift.io/role,
operator: In,
values: [worker,workerpool-canary]
}
nodeSelector:
matchLabels:
node-role.kubernetes.io/workerpool-canary: ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Node selector is required. For example
$ cat mcp1.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: mcp1
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp1]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/mcp1: ""
$ oc create -f mcp1.yaml
$ oc label node ci-ln-lnxrcz2-f76d1-8vztb-worker-c-p8zf5 node-role.kubernetes.io/mcp1=
works for me everytime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the example file with matchLabels
, which should be the same with the label added to nodes. It works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<1> is defined for mcp name, <3> is to filter matched nodes to add them into a MCP, so I think the description looks good to me.
@LalatenduMohanty Made changes. I am on PTO Friday 8/27. Will review any further comments and do a thorough editorial review on 8/29. |
@jiajliu @jianlinliu Can you take a look? It is a rewrite of #34445 |
@mburke5678 With my fresh testing I found out that the MCP behavior is different in OCP 4.6 (i.e. nodes get reboot when added to MCP). So we should only backport this till 4.7. |
|
@jiajliu @jianlinliu We have some customers who are waiting for the docs. So if we can merge this before end of this week it will be great. Thanks in advance. |
OK, I will work with jianlin together to speed up the review work. |
---- | ||
<1> Specify a name for the machine config pool | ||
<2> Specify a selector to filter keys. This selects all nodes with the `machineconfiguration.openshift.io/role` value equal to `worker` and `mcp-noupdate`. | ||
<3> Specify the custom label you added to the node(s) that you want in this machine config pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the example file with matchLabels
, which should be the same with the label added to nodes. It works for me.
LGTM. |
lgtm |
f2ea1ad
to
ee6fd94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were some IBM style guide issues, but otherwise looks good.
|
||
This topic describes the general workflow of this canary rollout update process. The steps to perform each task in the workflow are described in the following sections. | ||
|
||
. Create MCPs based on the worker pool. The number of nodes in each MCP depends on a few factors, such as your maintenance window duration for each MCP, and the amount of reserve capacity (extra worker nodes) available in your cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the IBM style guide, I would avoid using parentheses in general text.
Use parentheses to identify items such as abbreviations, symbols, and measurements, but avoid using parentheses in general text.
Pausing the MCP also pauses the kube-apiserver-to-kubelet-signer automatic CA certificates rotation. New CA certificates are generated at 292 days from the installation date and old certificates are removed 365 days from the installation date. See the link:https://access.redhat.com/articles/5651701[Understand CA cert auto renewal in Red Hat OpenShift 4] to find out how much time you have before the next automatic CA certificate rotation. Make sure the pools are unpaused when the CA cert rotation happens. If the MCPs are paused, the cert rotation does not happen, which causes the cluster to become degraded and causes failure in multiple `oc` commands, including but not limited to `oc debug`, `oc logs`, `oc exec`, and `oc attach`. | ||
==== | ||
|
||
. Perform the cluster update. The update process updates the MCPs that are not paused, including the control plane nodes (also known as the master nodes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the IBM style guide, I would avoid using parentheses in general text.
Use parentheses to identify items such as abbreviations, symbols, and measurements, but avoid using parentheses in general text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is our style for transitioning from master to control plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mburke5678 oh that's right! :) Thanks for the reminder.
node/ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz labeled | ||
---- | ||
+ | ||
The MCO moves the node(s) back to the original MCP and reconciles the node to the MCP configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MCO moves the node(s) back to the original MCP and reconciles the node to the MCP configuration. | |
The MCO moves the nodes back to the original MCP and reconciles the node to the MCP configuration. |
Per the IBM style guide, don't use parentheses (s).
Do not use the letter s in parentheses (s) to indicate that a noun can be singular or plural. Some languages form plural nouns differently than English, and the construction (s) can cause translation problems. Instead, use the plural form or, if it is important to indicate both singular and plural options, use one or more.
---- | ||
<1> Specify a name for the MCP. | ||
<2> Specify the `worker` and custom MCP name. | ||
<3> Specify the custom label you added to the node(s) that you want in this pool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3> Specify the custom label you added to the node(s) that you want in this pool. | |
<3> Specify the custom label you added to the nodes that you want in this pool. |
Per the IBM style guide, don't use parentheses (s).
Do not use the letter s in parentheses (s) to indicate that a noun can be singular or plural. Some languages form plural nouns differently than English, and the construction (s) can cause translation problems. Instead, use the plural form or, if it is important to indicate both singular and plural options, use one or more.
+ | ||
[source,terminal] | ||
---- | ||
$ oc label node <node name> node-role.kubernetes.io/<custom-label>= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an equal sign at the end of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently so:
$ oc label node ci-ln-wl1d49k-f76d1-hqh8r-worker-a-nnjf6 node-role.kubernetes.io/workerpool-canary
error: at least one label update is required
mburke@mburke ~ $ oc label node ci-ln-wl1d49k-f76d1-hqh8r-worker-a-nnjf6 node-role.kubernetes.io/workerpool-canary=
node/ci-ln-wl1d49k-f76d1-hqh8r-worker-a-nnjf6 labeled
+ | ||
[source,terminal] | ||
---- | ||
$ oc label node ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz node-role.kubernetes.io/workerpool-canary= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an equal sign at the end of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
ee6fd94
to
fb62b82
Compare
/cherrypick enterprise-4.7 |
@mburke5678: new pull request created: #36159 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick enterprise-4.8 |
/cherrypick enterprise-4.9 |
@mburke5678: new pull request created: #36160 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@mburke5678: new pull request created: #36161 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
#34445
Another attempt at this issue. @LalatenduMohanty suggested that the single module I had added for the update using MCPs procedure was insufficient. Among concern, if I understood correctly, was that having an assembly gave more importance to the procedure, encouraging the user to consider whether to use this procedure and carefully consider how to go about executing the procedure to avoid issues.
However, @codyhoag, the erstwhile update documentation person, and I, would like to see the MCP procedure at least mentioned in each update assembly to give more exposure (despite Lalatendu's suggestion that not many users would use the procedure). And, we were concerned that a separate MCP assembly would require the user to leave that assembly at the time to perform the actual update, which is not a good user experience. (Plus, it might violate our flexible content guidelines.)
As a compromise, I added a module to each of the update assemblies (minor, within minor console, within minor CLI), right before the update procedure, that describes the update using MCPs procedure (which I have tentatively titled in-service software update) that describes the process, including intentionally scaring-sounding wording to dissuade casual users. If the user wants to use MCPs, there is link to the ISSU assembly. I have only made these additions to the Updating a cluster between minor versions topic so far.
That assembly is based on Lalatendu's PR, walking you through the required steps. At the point where the user is to perform the actual, there are links back to each of the update procedures. After the update procedure, a Next Step leads you back to the ISSU assembly to finish that process.
Not perfect. But there is some precedent in the update docs to jump around a bit, namely the note in the minor versions doc for CLI users about using the console to change the update channel.
The request to add an update using MCPs came from customers. A such I want to give this matter its due attention.
Preview of update using MCP assembly: https://deploy-preview-35420--osdocs.netlify.app/openshift-enterprise/latest/updating/update-using-custom-machine-config-pools?utm_source=github&utm_campaign=bot_dp
Preview of ISSU module added to one update assembly: https://deploy-preview-35420--osdocs.netlify.app/openshift-enterprise/latest/updating/updating-cluster-between-minor.html#update-using-custom-machine-config-pools-issu_updating-cluster-between-minor
FYI @vikram-redhat