-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Updates for OCP 3.4 cluster upgrades (automated + manual) #3352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
$ ansible-playbook -i <path/to/inventory/file> \ | ||
</path/to/upgrade/playbook> \ | ||
-e openshift_upgrade_nodes_serial="2" \ | ||
-e openshift_upgrade_nodes_label="region=group1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks spot on. 👍
*Option A) Upgrade masters and nodes in a single phase.* | ||
+ | ||
Run the *_upgrade.yml_* playbook to upgrade the cluster in one phase: master | ||
components first, then nodes in-place: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to consider referring to this as the control plane, rather than master components. Not sure if now is a good time or sometime in future. Control plane matches the terminology used upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgoodwin OK, "control plane" was defined previously in the earlier section, but now I've made sure all sections prefer the "control plane" term.
When upgrading in multiple phases, the control plane upgrade phase includes: | ||
|
||
- master components | ||
- Docker only on any stand-alone etcd hosts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Etcd is upgraded during control plane upgrade as well right @sdodson ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not currently. I'd be ok with including it there too.
|
||
- node services running on masters | ||
- Docker running on masters | ||
- node services running on stand-alone nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a pending TODO to bring this doc to your attention once it's published: https://docs.google.com/a/redhat.com/document/d/1Jv5ROqosiG2-WhdWEkwpXPmvKbRaUC5yiLEAnZnzkeg/edit?usp=sharing
The point was raised that we don't clearly state if it's possible to do a zero downtime upgrade or not, and if so what would be required in terms of infrastructure. I don't know if there's time now to look into this but hopefully we can get something into official docs (at some point) stating that you can do zero downtime upgrades if your app is capable, and you have sufficient replication for your app, extra nodes, and at least 3 infra nodes.
Also the description of the steps of upgrade might be useful.
FYI, @mwoodson docs changes for 3.4 upgrade changes. |
thanks |
a59a344
to
0a693d9
Compare
preforms all pre-upgrade checks without actually upgrading any hosts, and | ||
reports any problems found. | ||
==== | ||
+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgoodwin I moved the bit here about <customized_node_upgrade_variables>
inline with the commands, and replaced it with a note about --tag pre_upgrade
.
Setting this to ON_QA in https://bugzilla.redhat.com/show_bug.cgi?id=1383278. @ahardin-rh PTAL for peer review? |
== Technology Preview Features | ||
|
||
Some features in this release are currently in Technology Preview. These | ||
experimental features are not intended for production use. Please note the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Please note/Note
[[ocp-34-known-issues]] | ||
== Known Issues | ||
|
||
* Setting the `*forks*` parameter in the *_/etc/ansible/ansible.cfg_* file to 11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`forks`
@adellape LGTM! Made some comments in the release notes, but realized that's a WIP. Everything else looks great. |
b7fe3f9
to
7101d49
Compare
Went ahead and updated manual cluster upgrade steps too here, per notes from https://bugzilla.redhat.com/show_bug.cgi?id=1383278#c5. http://file.rdu.redhat.com/~adellape/120216/upgrade34/install_config/upgrading/manual_upgrades.html |
---- | ||
$ oc new-app -f metrics-deployer.yaml \ | ||
$ oc new-app --as=system:serviceaccount:openshift-infra:metrics-deployer \ | ||
-f metrics-deployer.yaml \ | ||
-p HAWKULAR_METRICS_HOSTNAME=hm.example.com \ | ||
-p MODE=refresh <1> | ||
---- | ||
<1> In the original deployment command, there was no `MODE=refresh`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwringe Does --as=system:serviceaccount:openshift-infra:metrics-deployer
belong here in the metrics upgrade command, like it was added to the metrics install doc per #3018? I've gone ahead and added it for now.
Please see also the first step added above (about the view
role for the hawkular SA), per QE feedback in https://bugzilla.redhat.com/show_bug.cgi?id=1383278#c5.
Preview build:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the as serviceaccount option should be in there. Thanks for the catch.
*Updated Elasticsearch in EFK Stack* | ||
|
||
The latest EFK stack now uses Elasticsearch 2.4 with a common data model. This | ||
means Fluentd sends logs to Elasticsearch with a new indexing pattern for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed with @ewolinetz that this should now be Elasticsearch 2.4 instead of 2.3 (which we originally had in this note per #3211, but has changed for OCP 3.4 since then).
Also FYI @danmacpherson @vikram-redhat I moved this note to here in the 3.4 release notes, instead of at the tail end of the logging upgrade section because it felt buried and wasn't specific to the upgrade task. Also fixed a rendering issue in this note due to some superfluous +
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adellape .
57ff90b
to
4576c69
Compare
@dgoodwin I've added manual docker upgrade steps since you last looked. Can you review? For masters (near the end): For nodes (step 5): |
Instructions for Docker upgrade seem right, however it's worth noting this would technically be doing it twice as all masters are implicitly nodes as well. In automated upgrade I only hit Docker during node upgrade phase, should we do the same here? |
@dgoodwin There's a warning right before the Docker upgrade in master sorta to that effect:
So basically if it's a master that is unschedulable, the "Upgrading Masters" section handles everything (step 3 also has you upgrade node/openvswitch packages) and you don't have to also go through the "Upgrading Nodes" steps for that host. But if it is a master that is schedulable for some reason, then don't do the docker upgrade there and go through all of "Upgrading Nodes" as well for that host, where you'll do the Docker upgrade. If that's too convoluted and you think all master hosts should just go through the "Upgrading Masters" and "Upgrading Nodes" sections, then the latter section will need to get re-written a bit cuz it currently assumes a schedulable node and talks about some stuff that would be superfluous for unschedulable masters (like setting |
Ah I understand, no that sounds fine to me. Also a sign I might need to re-think my structure here with this yet again. |
3d0354a
to
015f600
Compare
@sdodson Now that
Also @ahardin-rh for peer review of ^ and http://file.rdu.redhat.com/~adellape/120216/upgrade34/install_config/upgrading/manual_upgrades.html in general (I don't think I had modified that at all when you last looked in this PR). |
- etcd has been updated to 3.1.0-rc.0. | ||
+ | ||
While etcd has been updated from etcd 2 to 3, {product-title} 3.4 continues to | ||
use the etcd v2 API, which is backwards compatible with the etcd 3, for both new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd get rid 'which is backwards compatible with the etcd 3'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol @ "the"
- Docker running on masters | ||
- node services running on stand-alone nodes | ||
|
||
When upgrading only the nodes, it is required that the control plane has already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads a little awkward. Maybe something like:
When upgrading only the nodes, the control plane must already be upgraded.
The | ||
xref:../../install_config/upgrading/blue_green_deployments.adoc#upgrading-blue-green-deployments[blue-green deployment] upgrade method follows a similar flow to the in-place method: | ||
masters and etcd servers are still upgraded first, however a parallel | ||
environment is created for new nodes instead of upgrading them in-place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last instance of in-place should be "in place" (not "in-place") since it's not modifying a noun, like "in-place method" earlier in the paragraph
@adellape just 2 minor nits from me ⭐ |
@adellape excluder bits look good to me |
Also logging + cluster upgrade tweaks per QE feedback Add etcd v2 API note to release notes Add manual etcd upgrade steps (RPM+containerized) Add upgrade steps for excluder pkgs
Thanks @ahardin-rh @sdodson! Squashed and merging. |
Preview build:
http://file.rdu.redhat.com/~adellape/120216/upgrade34/install_config/upgrading/automated_upgrades.html
Includes:
--tag pre_upgrade
option in a Note box.<version>
Asynchronous Releases" section and consolidates it into the re-titled "Upgrading to the Latest OpenShift Container Platform 3.4 Release", as the former was going to become really redundant w/ all the control plane stuff getting added.@dgoodwin @sdodson PTAL?