Skip to content

GH34445: Doc on using machine config pool during update 2 #35420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 8, 2021

Conversation

mburke5678
Copy link
Contributor

@mburke5678 mburke5678 commented Aug 10, 2021

#34445

Another attempt at this issue. @LalatenduMohanty suggested that the single module I had added for the update using MCPs procedure was insufficient. Among concern, if I understood correctly, was that having an assembly gave more importance to the procedure, encouraging the user to consider whether to use this procedure and carefully consider how to go about executing the procedure to avoid issues.

However, @codyhoag, the erstwhile update documentation person, and I, would like to see the MCP procedure at least mentioned in each update assembly to give more exposure (despite Lalatendu's suggestion that not many users would use the procedure). And, we were concerned that a separate MCP assembly would require the user to leave that assembly at the time to perform the actual update, which is not a good user experience. (Plus, it might violate our flexible content guidelines.)

As a compromise, I added a module to each of the update assemblies (minor, within minor console, within minor CLI), right before the update procedure, that describes the update using MCPs procedure (which I have tentatively titled in-service software update) that describes the process, including intentionally scaring-sounding wording to dissuade casual users. If the user wants to use MCPs, there is link to the ISSU assembly. I have only made these additions to the Updating a cluster between minor versions topic so far.

That assembly is based on Lalatendu's PR, walking you through the required steps. At the point where the user is to perform the actual, there are links back to each of the update procedures. After the update procedure, a Next Step leads you back to the ISSU assembly to finish that process.

Not perfect. But there is some precedent in the update docs to jump around a bit, namely the note in the minor versions doc for CLI users about using the console to change the update channel.

The request to add an update using MCPs came from customers. A such I want to give this matter its due attention.

Preview of update using MCP assembly: https://deploy-preview-35420--osdocs.netlify.app/openshift-enterprise/latest/updating/update-using-custom-machine-config-pools?utm_source=github&utm_campaign=bot_dp
Preview of ISSU module added to one update assembly: https://deploy-preview-35420--osdocs.netlify.app/openshift-enterprise/latest/updating/updating-cluster-between-minor.html#update-using-custom-machine-config-pools-issu_updating-cluster-between-minor

FYI @vikram-redhat

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 10, 2021
@netlify
Copy link

netlify bot commented Aug 10, 2021

✔️ Deploy Preview for osdocs ready!

🔨 Explore the source changes: fb62b82

🔍 Inspect the deploy log: https://app.netlify.com/sites/osdocs/deploys/6138f35ee3aa5e0007d14b06

😎 Browse the preview: https://deploy-preview-35420--osdocs.netlify.app/openshift-enterprise/latest/updating/update-using-custom-machine-config-pools

@mburke5678
Copy link
Contributor Author

@LalatenduMohanty How does this look? Don't worry about the wording at this time. I would need to make some editorial changes to comply with the OpenShift docs.
Is calling this procedure an in-service software update appropriate? We need something more descriptive than Use of customized machine config pools during update. Your docs mention canary updates. That is a possibility, as we do mention canary update in one topic and canary deployments in the docs.

@mburke5678 mburke5678 changed the title Doc on using machine config pool during update 2 GH34445: Doc on using machine config pool during update 2 Aug 11, 2021
@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Aug 12, 2021

@LalatenduMohanty How does this look? Don't worry about the wording at this time. I would need to make some editorial changes to comply with the OpenShift docs.
Is calling this procedure an in-service software update appropriate? We need something more descriptive than Use of customized machine config pools during update. Your docs mention canary updates. That is a possibility, as we do mention canary update in one topic and canary deployments in the docs.

I liked the way you have added scaring-sounding wording to dissuade casual users in other topics. With respect to calling it Performing an in-service software update is not technically correct as the default OpenShift update should be always an in-service update.

@LalatenduMohanty
Copy link
Member

May be we can call it OTA canary rollout OTA is Over the air or Performing a canary update rollout to OpenShift cluster or something similar. I will get back to you on this.

Copy link
Member

@sdodson sdodson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left several suggestions, mostly geared at trying to avoid saying rolling update because all updates are rolling updates even before what we're proposing here.

Added a decent example which leverages both canary and segmenting to fit short windows with relatively look drain and reboot cycles.

@sdodson
Copy link
Member

sdodson commented Aug 18, 2021

Random high level thought before we merge this we should normalize on either update or upgrade so that we're consistent. The former has far more usage throughout this doc but I haven't looked at how much we us either in other pre-existing docs.

----
<1> Specify a name for the machine config pool
<2> Specify a selector to filter keys. This selects all nodes with the `machineconfiguration.openshift.io/role` value equal to `worker` and `mcp-noupdate`.
<3> Specify the custom label you added to the node(s) that you want in this machine config pool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason <3> Specify the custom label you added to the node(s) that you want in this machine config pool. is separately mentioned from <1> Specify a name for the machine config pool

Copy link
Contributor Author

@mburke5678 mburke5678 Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the matchLabel to get the node to move to that MCP, right?

Copy link
Member

@LalatenduMohanty LalatenduMohanty Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in the docs we have asked the user to add the label name same as MCP name. So I do not think we need to change from that narrative. Also it also keeps it simple to follow the docs.

Copy link
Contributor Author

@mburke5678 mburke5678 Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiajliu @jianlinliu Can you help here?

@LalatenduMohanty
I labelled the node, created the MCP without the matchLabels. I see the role added to the node, the new MCP (rendered-workerpool-canary). The node doesnt appear to have moved to the new MCP:


$ oc get nodes
NAME                                       STATUS   ROLES                      AGE   VERSION
ci-ln-l6w1vgb-f76d1-l6qwn-master-0         Ready    master                     32m   v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-master-1         Ready    master                     32m   v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-master-2         Ready    master                     32m   v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-worker-a-c5kg2   Ready    worker,workerpool-canary   23m   v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-worker-b-kld5s   Ready    worker                     23m   v1.22.0-rc.0+249ab87
ci-ln-l6w1vgb-f76d1-l6qwn-worker-c-w7q8t   Ready    worker                     23m   v1.22.0-rc.0+249ab87
mburke@mburke ~ $ oc get machineconfigpool
NAME                CONFIG                                                        UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master              rendered-master-b9f06deaab4e39c6a025531d8ada384a              True      False      False      3              3                   3                     0                      31m
worker              rendered-worker-0420fb639ebd3e75849c539298f788d7              True      False      False      3              3                   3                     0                      31m
workerpool-canary   rendered-workerpool-canary-0420fb639ebd3e75849c539298f788d7   True      False      False      0              0                   0                     0                      69s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I create the MCP with the matchLabels as in the docs, the node moves:

oc get machineconfigpool
NAME                CONFIG                                                        UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master              rendered-master-b9f06deaab4e39c6a025531d8ada384a              True      False      False      3              3                   3                     0                      36m
worker              rendered-worker-0420fb639ebd3e75849c539298f788d7              True      False      False      2              2                   2                     0                      36m
workerpool-canary   rendered-workerpool-canary-0420fb639ebd3e75849c539298f788d7   True      False      False      1              1                   1                     0                      23s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the YAML that didn't seem to work:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: workerpool-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,workerpool-canary]
        }

The command to label the node:

oc label node ci-ln-l6w1vgb-f76d1-l6qwn-worker-a-c5kg2 node-role.kubernetes.io/workerpool-canary=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I then added the nodeSelector stanza and it seemed to work:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: workerpool-canary
spec:
  machineConfigSelector:
    matchExpressions:
      - {
         key: machineconfiguration.openshift.io/role,
         operator: In,
         values: [worker,workerpool-canary]
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/workerpool-canary: ""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Node selector is required. For example

$ cat mcp1.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: mcp1
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp1]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/mcp1: ""

$ oc create -f mcp1.yaml 

$ oc label node ci-ln-lnxrcz2-f76d1-8vztb-worker-c-p8zf5 node-role.kubernetes.io/mcp1=

works for me everytime.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the example file with matchLabels, which should be the same with the label added to nodes. It works for me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<1> is defined for mcp name, <3> is to filter matched nodes to add them into a MCP, so I think the description looks good to me.

@mburke5678
Copy link
Contributor Author

mburke5678 commented Aug 27, 2021

@LalatenduMohanty Made changes. I am on PTO Friday 8/27. Will review any further comments and do a thorough editorial review on 8/29.

@LalatenduMohanty
Copy link
Member

@jiajliu @jianlinliu Can you take a look? It is a rewrite of #34445

@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Aug 31, 2021

@mburke5678 With my fresh testing I found out that the MCP behavior is different in OCP 4.6 (i.e. nodes get reboot when added to MCP). So we should only backport this till 4.7.

@jiajliu
Copy link

jiajliu commented Sep 1, 2021

@jiajliu @jianlinliu Can you take a look? It is a rewrite of #34445
Yes, jianlin will work on it later.

@LalatenduMohanty
Copy link
Member

@jiajliu @jianlinliu We have some customers who are waiting for the docs. So if we can merge this before end of this week it will be great. Thanks in advance.

@jiajliu
Copy link

jiajliu commented Sep 2, 2021

OK, I will work with jianlin together to speed up the review work.

----
<1> Specify a name for the machine config pool
<2> Specify a selector to filter keys. This selects all nodes with the `machineconfiguration.openshift.io/role` value equal to `worker` and `mcp-noupdate`.
<3> Specify the custom label you added to the node(s) that you want in this machine config pool.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the example file with matchLabels, which should be the same with the label added to nodes. It works for me.

@jiajliu
Copy link

jiajliu commented Sep 6, 2021

LGTM.

@jianlinliu
Copy link

lgtm

Copy link
Contributor

@sagidlow sagidlow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some IBM style guide issues, but otherwise looks good.


This topic describes the general workflow of this canary rollout update process. The steps to perform each task in the workflow are described in the following sections.

. Create MCPs based on the worker pool. The number of nodes in each MCP depends on a few factors, such as your maintenance window duration for each MCP, and the amount of reserve capacity (extra worker nodes) available in your cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the IBM style guide, I would avoid using parentheses in general text.

Use parentheses to identify items such as abbreviations, symbols, and measurements, but avoid using parentheses in general text.

Pausing the MCP also pauses the kube-apiserver-to-kubelet-signer automatic CA certificates rotation. New CA certificates are generated at 292 days from the installation date and old certificates are removed 365 days from the installation date. See the link:https://access.redhat.com/articles/5651701[Understand CA cert auto renewal in Red Hat OpenShift 4] to find out how much time you have before the next automatic CA certificate rotation. Make sure the pools are unpaused when the CA cert rotation happens. If the MCPs are paused, the cert rotation does not happen, which causes the cluster to become degraded and causes failure in multiple `oc` commands, including but not limited to `oc debug`, `oc logs`, `oc exec`, and `oc attach`.
====

. Perform the cluster update. The update process updates the MCPs that are not paused, including the control plane nodes (also known as the master nodes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the IBM style guide, I would avoid using parentheses in general text.

Use parentheses to identify items such as abbreviations, symbols, and measurements, but avoid using parentheses in general text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is our style for transitioning from master to control plane.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mburke5678 oh that's right! :) Thanks for the reminder.

node/ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz labeled
----
+
The MCO moves the node(s) back to the original MCP and reconciles the node to the MCP configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The MCO moves the node(s) back to the original MCP and reconciles the node to the MCP configuration.
The MCO moves the nodes back to the original MCP and reconciles the node to the MCP configuration.

Per the IBM style guide, don't use parentheses (s).

Do not use the letter s in parentheses (s) to indicate that a noun can be singular or plural. Some languages form plural nouns differently than English, and the construction (s) can cause translation problems. Instead, use the plural form or, if it is important to indicate both singular and plural options, use one or more.

----
<1> Specify a name for the MCP.
<2> Specify the `worker` and custom MCP name.
<3> Specify the custom label you added to the node(s) that you want in this pool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<3> Specify the custom label you added to the node(s) that you want in this pool.
<3> Specify the custom label you added to the nodes that you want in this pool.

Per the IBM style guide, don't use parentheses (s).

Do not use the letter s in parentheses (s) to indicate that a noun can be singular or plural. Some languages form plural nouns differently than English, and the construction (s) can cause translation problems. Instead, use the plural form or, if it is important to indicate both singular and plural options, use one or more.

+
[source,terminal]
----
$ oc label node <node name> node-role.kubernetes.io/<custom-label>=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an equal sign at the end of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently so:

 $ oc label node ci-ln-wl1d49k-f76d1-hqh8r-worker-a-nnjf6 node-role.kubernetes.io/workerpool-canary
error: at least one label update is required
mburke@mburke ~ $ oc label node ci-ln-wl1d49k-f76d1-hqh8r-worker-a-nnjf6 node-role.kubernetes.io/workerpool-canary=
node/ci-ln-wl1d49k-f76d1-hqh8r-worker-a-nnjf6 labeled

+
[source,terminal]
----
$ oc label node ci-ln-0qv1yp2-f76d1-kl2tq-worker-a-j2ssz node-role.kubernetes.io/workerpool-canary=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an equal sign at the end of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@sagidlow sagidlow added the peer-review-done Signifies that the peer review team has reviewed this PR label Sep 8, 2021
@mburke5678 mburke5678 merged commit fdd3f12 into openshift:main Sep 8, 2021
@mburke5678 mburke5678 deleted the OTA-410-2 branch September 8, 2021 17:42
@mburke5678
Copy link
Contributor Author

/cherrypick enterprise-4.7

@openshift-cherrypick-robot

@mburke5678: new pull request created: #36159

In response to this:

/cherrypick enterprise-4.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mburke5678
Copy link
Contributor Author

/cherrypick enterprise-4.8

@mburke5678
Copy link
Contributor Author

/cherrypick enterprise-4.9

@openshift-cherrypick-robot

@mburke5678: new pull request created: #36160

In response to this:

/cherrypick enterprise-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@mburke5678: new pull request created: #36161

In response to this:

/cherrypick enterprise-4.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.7 branch/enterprise-4.8 branch/enterprise-4.9 peer-review-done Signifies that the peer review team has reviewed this PR size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants