Skip to content

Updates for OCP 3.4 cluster upgrades (automated + manual) #3352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 15, 2016

Conversation

adellape
Copy link
Contributor

@adellape adellape commented Dec 8, 2016

Preview build:

http://file.rdu.redhat.com/~adellape/120216/upgrade34/install_config/upgrading/automated_upgrades.html

Includes:

@dgoodwin @sdodson PTAL?

$ ansible-playbook -i <path/to/inventory/file> \
</path/to/upgrade/playbook> \
-e openshift_upgrade_nodes_serial="2" \
-e openshift_upgrade_nodes_label="region=group1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks spot on. 👍

*Option A) Upgrade masters and nodes in a single phase.*
+
Run the *_upgrade.yml_* playbook to upgrade the cluster in one phase: master
components first, then nodes in-place:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to consider referring to this as the control plane, rather than master components. Not sure if now is a good time or sometime in future. Control plane matches the terminology used upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgoodwin OK, "control plane" was defined previously in the earlier section, but now I've made sure all sections prefer the "control plane" term.

When upgrading in multiple phases, the control plane upgrade phase includes:

- master components
- Docker only on any stand-alone etcd hosts
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Etcd is upgraded during control plane upgrade as well right @sdodson ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently. I'd be ok with including it there too.


- node services running on masters
- Docker running on masters
- node services running on stand-alone nodes
Copy link

@dgoodwin dgoodwin Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a pending TODO to bring this doc to your attention once it's published: https://docs.google.com/a/redhat.com/document/d/1Jv5ROqosiG2-WhdWEkwpXPmvKbRaUC5yiLEAnZnzkeg/edit?usp=sharing

The point was raised that we don't clearly state if it's possible to do a zero downtime upgrade or not, and if so what would be required in terms of infrastructure. I don't know if there's time now to look into this but hopefully we can get something into official docs (at some point) stating that you can do zero downtime upgrades if your app is capable, and you have sufficient replication for your app, extra nodes, and at least 3 infra nodes.

Also the description of the steps of upgrade might be useful.

@sdodson
Copy link
Member

sdodson commented Dec 9, 2016

FYI, @mwoodson docs changes for 3.4 upgrade changes.

@mwoodson
Copy link

mwoodson commented Dec 9, 2016

thanks

@adellape adellape changed the title [WIP] Updates for OCP 3.4 upgrade process [WIP] Updates for OCP 3.4 automated upgrades Dec 9, 2016
@adellape adellape force-pushed the upgrade34 branch 2 times, most recently from a59a344 to 0a693d9 Compare December 9, 2016 19:57
@adellape adellape changed the title [WIP] Updates for OCP 3.4 automated upgrades Updates for OCP 3.4 automated upgrades Dec 9, 2016
preforms all pre-upgrade checks without actually upgrading any hosts, and
reports any problems found.
====
+
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgoodwin I moved the bit here about <customized_node_upgrade_variables> inline with the commands, and replaced it with a note about --tag pre_upgrade.

@adellape
Copy link
Contributor Author

adellape commented Dec 9, 2016

Setting this to ON_QA in https://bugzilla.redhat.com/show_bug.cgi?id=1383278.

@ahardin-rh PTAL for peer review?

== Technology Preview Features

Some features in this release are currently in Technology Preview. These
experimental features are not intended for production use. Please note the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Please note/Note

[[ocp-34-known-issues]]
== Known Issues

* Setting the `*forks*` parameter in the *_/etc/ansible/ansible.cfg_* file to 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`forks`

@ahardin-rh
Copy link
Contributor

@adellape LGTM! Made some comments in the release notes, but realized that's a WIP. Everything else looks great.

@adellape adellape force-pushed the upgrade34 branch 2 times, most recently from b7fe3f9 to 7101d49 Compare December 12, 2016 23:17
@adellape
Copy link
Contributor Author

adellape commented Dec 12, 2016

----
$ oc new-app -f metrics-deployer.yaml \
$ oc new-app --as=system:serviceaccount:openshift-infra:metrics-deployer \
-f metrics-deployer.yaml \
-p HAWKULAR_METRICS_HOSTNAME=hm.example.com \
-p MODE=refresh <1>
----
<1> In the original deployment command, there was no `MODE=refresh`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwringe Does --as=system:serviceaccount:openshift-infra:metrics-deployer belong here in the metrics upgrade command, like it was added to the metrics install doc per #3018? I've gone ahead and added it for now.

Please see also the first step added above (about the view role for the hawkular SA), per QE feedback in https://bugzilla.redhat.com/show_bug.cgi?id=1383278#c5.

Preview build:

http://file.rdu.redhat.com/~adellape/120216/upgrade34/install_config/upgrading/manual_upgrades.html#manual-upgrading-cluster-metrics

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the as serviceaccount option should be in there. Thanks for the catch.

*Updated Elasticsearch in EFK Stack*

The latest EFK stack now uses Elasticsearch 2.4 with a common data model. This
means Fluentd sends logs to Elasticsearch with a new indexing pattern for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with @ewolinetz that this should now be Elasticsearch 2.4 instead of 2.3 (which we originally had in this note per #3211, but has changed for OCP 3.4 since then).

Also FYI @danmacpherson @vikram-redhat I moved this note to here in the 3.4 release notes, instead of at the tail end of the logging upgrade section because it felt buried and wasn't specific to the upgrade task. Also fixed a rendering issue in this note due to some superfluous +s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adellape .

@adellape adellape changed the title Updates for OCP 3.4 automated upgrades Updates for OCP 3.4 cluster upgrades (automated + manual) Dec 13, 2016
@adellape adellape force-pushed the upgrade34 branch 3 times, most recently from 57ff90b to 4576c69 Compare December 15, 2016 15:16
@adellape
Copy link
Contributor Author

@dgoodwin
Copy link

Instructions for Docker upgrade seem right, however it's worth noting this would technically be doing it twice as all masters are implicitly nodes as well. In automated upgrade I only hit Docker during node upgrade phase, should we do the same here?

@adellape
Copy link
Contributor Author

@dgoodwin There's a warning right before the Docker upgrade in master sorta to that effect:

The node component on masters is set by default to unschedulable status during initial installation, so that pods are not deployed to them. However, it is possible to set them schedulable during the initial installation or manually thereafter. If any of your masters are also configured as a schedulable node, skip the following Docker upgrade steps for those masters and instead run all steps described in Upgrading Nodes when you get to that section for those hosts as well.

So basically if it's a master that is unschedulable, the "Upgrading Masters" section handles everything (step 3 also has you upgrade node/openvswitch packages) and you don't have to also go through the "Upgrading Nodes" steps for that host. But if it is a master that is schedulable for some reason, then don't do the docker upgrade there and go through all of "Upgrading Nodes" as well for that host, where you'll do the Docker upgrade.

If that's too convoluted and you think all master hosts should just go through the "Upgrading Masters" and "Upgrading Nodes" sections, then the latter section will need to get re-written a bit cuz it currently assumes a schedulable node and talks about some stuff that would be superfluous for unschedulable masters (like setting --scheduable=false and evac'ing pods).

@dgoodwin
Copy link

Ah I understand, no that sounds fine to me.

Also a sign I might need to re-think my structure here with this yet again.

@adellape adellape force-pushed the upgrade34 branch 2 times, most recently from 3d0354a to 015f600 Compare December 15, 2016 20:03
@adellape
Copy link
Contributor Author

adellape commented Dec 15, 2016

@sdodson Now that upgrade_etcd.yml is called with the normal upgrade playbooks, I've made some changes (currently a separate commit 015f600). PTAL:

Also @ahardin-rh for peer review of ^ and http://file.rdu.redhat.com/~adellape/120216/upgrade34/install_config/upgrading/manual_upgrades.html in general (I don't think I had modified that at all when you last looked in this PR).

@adellape
Copy link
Contributor Author

@tdawson @sdodson I've also now added steps in the upgrade docs about the *-excluder packages. Currently in the following separate commit: 65b4e1a.

I'll add excluder steps to the normal install docs in a separate PR about 3.4 installs.

- etcd has been updated to 3.1.0-rc.0.
+
While etcd has been updated from etcd 2 to 3, {product-title} 3.4 continues to
use the etcd v2 API, which is backwards compatible with the etcd 3, for both new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd get rid 'which is backwards compatible with the etcd 3'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol @ "the"

- Docker running on masters
- node services running on stand-alone nodes

When upgrading only the nodes, it is required that the control plane has already
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a little awkward. Maybe something like:
When upgrading only the nodes, the control plane must already be upgraded.

The
xref:../../install_config/upgrading/blue_green_deployments.adoc#upgrading-blue-green-deployments[blue-green deployment] upgrade method follows a similar flow to the in-place method:
masters and etcd servers are still upgraded first, however a parallel
environment is created for new nodes instead of upgrading them in-place.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last instance of in-place should be "in place" (not "in-place") since it's not modifying a noun, like "in-place method" earlier in the paragraph

@ahardin-rh
Copy link
Contributor

@adellape just 2 minor nits from me ⭐

@sdodson
Copy link
Member

sdodson commented Dec 15, 2016

@adellape excluder bits look good to me

Also logging + cluster upgrade tweaks per QE feedback
Add etcd v2 API note to release notes
Add manual etcd upgrade steps (RPM+containerized)
Add upgrade steps for excluder pkgs
@adellape
Copy link
Contributor Author

Thanks @ahardin-rh @sdodson!

Squashed and merging.

@adellape adellape merged commit 75058c7 into openshift:master Dec 15, 2016
@vikram-redhat vikram-redhat modified the milestones: Future Release, Staging, OCP 3.4 GA Jan 16, 2017
@adellape adellape deleted the upgrade34 branch November 9, 2017 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants