[Zen2] Support rolling upgrades from Zen1 #35737

DaveCTurner · 2018-11-20T13:37:16Z

We support rolling upgrades from Zen1 by keeping the master as a Zen1 node
until there are no more Zen1 nodes in the cluster, using the following
principles:

Zen1 nodes will never vote for Zen2 nodes
Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes
Zen2 nodes that were previously part of a mixed cluster will automatically
(and unsafely) bootstrap themselves when the last Zen1 node leaves.

We support rolling upgrades from Zen1 by keeping the master as a Zen1 node until there are no more Zen1 nodes in the cluster, using the following principles: - Zen1 nodes will never vote for Zen2 nodes - Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes - Zen2 nodes that were previously part of a mixed cluster will automatically (and unsafely) bootstrap themselves when the last Zen1 node leaves.

elasticmachine · 2018-11-20T13:37:22Z

Pinging @elastic/es-distributed

DaveCTurner · 2018-11-20T13:40:07Z

This is WIP, lacking in any tests apart from the obvious one (which works 🎉) but I would like an initial opinion on the approach before I go too far down the wrong path.

server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java

This reverts commit ae2b7fb.

ywelsch

Looking good. I've added some questions and ideas.

ywelsch · 2018-12-06T14:32:46Z

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationState.java

-                getLastAcceptedVersion(), clusterState.version());
-            throw new CoordinationStateRejectedException("incoming version " + clusterState.version() +
-                " lower or equal to current version " + getLastAcceptedVersion());
+            if (clusterState.term() == ZEN1_BWC_TERM) {


I wonder if we can somehow avoid putting this extra condition in this class, perhaps by creating fresh persistedstate / coordinationstate instances. Also I wonder if we should enforce the version semantics as long as the states are coming from the same master (i.e. same ephemeral id)

On reflection, I think that creating a fresh persistedstate / coordinationstate is more cumbersome as the alternative. Still would be good to enforce the version semantics here if the cluster states are from the same master (i.e. same ephemeral id).

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

server/src/main/java/org/elasticsearch/cluster/coordination/DiscoveryUpgradeService.java

server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java

ywelsch · 2018-12-06T21:48:40Z

test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

@@ -1939,7 +1940,8 @@ public synchronized String startNode(Settings settings) {
        }
        final List<NodeAndClient> nodes = new ArrayList<>();
        final int prevMasterCount = getMasterNodesCount();
-        int bootstrapMasterNodeIndex = prevMasterCount == 0 && autoManageMinMasterNodes && newMasterCount > 0
+        int bootstrapMasterNodeIndex = prevMasterCount == 0 && autoManageMinMasterNodes && newMasterCount > 0 && Arrays.stream(settings)
+            .allMatch(s -> Node.NODE_MASTER_SETTING.get(s) == false || TestZenDiscovery.USE_ZEN2.get(s) == true)


Should we run Zen1IT with autoManageMinMasterNodes disabled? It's doing odd stuff, e.g. when restarting a node in a 2 node cluster, it changes min_master_nodes to 1 and triggers the Zen2 node to bootstrap all by itself, although it probably shouldn't?

I think it does the right thing with min_master_nodes. I didn't think it bootstrapped the Zen2 node since there's already a master node present.

Ok, I think 053d746 is sufficient.

server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java

DaveCTurner

Some responses

server/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java

server/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

server/src/main/java/org/elasticsearch/cluster/coordination/DiscoveryUpgradeService.java

server/src/main/java/org/elasticsearch/discovery/PeerFinder.java

ywelsch · 2018-12-07T17:55:44Z

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationState.java

-                getLastAcceptedVersion(), clusterState.version());
-            throw new CoordinationStateRejectedException("incoming version " + clusterState.version() +
-                " lower or equal to current version " + getLastAcceptedVersion());
+            if (clusterState.term() == ZEN1_BWC_TERM) {


On reflection, I think that creating a fresh persistedstate / coordinationstate is more cumbersome as the alternative. Still would be good to enforce the version semantics here if the cluster states are from the same master (i.e. same ephemeral id).

server/src/main/java/org/elasticsearch/cluster/coordination/DiscoveryUpgradeService.java

server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java

DaveCTurner added >enhancement WIP v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 20, 2018

DaveCTurner requested a review from ywelsch November 20, 2018 13:37

CheckStyle

1ec1f8b

ywelsch mentioned this pull request Nov 20, 2018

A new cluster coordination layer #32006

Closed

61 tasks

elastic deleted a comment Nov 20, 2018

DaveCTurner added 7 commits November 21, 2018 12:53

Moar test

32a3d82

Merge branch 'zen2' into 2018-11-19-rolling-upgrade-to-zen2

bbb700f

Post-merge fixup

0678ebe

Bogus assertion

48888e0

Avoid problematic case

bb7a04c

Add restart tests as well as migration ones

7eb0137

Imports

5595d2b

ywelsch reviewed Nov 22, 2018

View reviewed changes

server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java Outdated Show resolved Hide resolved

DaveCTurner added 11 commits November 22, 2018 12:01

Better restart tests

7ea3769

Must claim unknown version

6c3bb69

Merge branch 'zen2' into 2018-11-19-rolling-upgrade-to-zen2

abe570e

Suppress exceptions during upgrade bootstrap

c83bb34

Fake ping ID 0 for remote pings

04208ab

Use FAKE_PING_ID to describe the situation

491349a

Imports

b218215

Add TODOs

98e0882

Whitespace

a2d609d

Revert bootstrapping condition

ae2b7fb

Revert "Revert bootstrapping condition"

5f5aeff

This reverts commit ae2b7fb.

ywelsch suggested changes Dec 6, 2018

View reviewed changes

DaveCTurner added 5 commits December 7, 2018 15:17

Merge branch 'master' into 2018-11-19-rolling-upgrade-to-zen2

4ec0cec

_rolling_ upgrade

67ac92e

Revert

c01216b

Revert

5d11689

Don't send PeersRequest to local node

0958cdb

DaveCTurner commented Dec 7, 2018

View reviewed changes

DaveCTurner added 9 commits December 7, 2018 15:37

Merge branch 'master' into 2018-11-19-rolling-upgrade-to-zen2

b0ab023

Horrible hack to force Zen1 nodes not to elect Zen2 masters

3435c73

Don't send incomprehensible messages from the future

d5ba1c7

Maybe more shards

1cfe428

Compile error

4b7b899

Simplify testMixedClusterFormation and avoid the 1+1 case

a96aa91

Fewer devices

d0a23d4

Randomise shards not nodes, d'oh

ed0a5ca

Merge branch 'master' into 2018-11-19-rolling-upgrade-to-zen2

497c5cf

ywelsch approved these changes Dec 7, 2018

View reviewed changes

DaveCTurner added 6 commits December 7, 2018 21:55

static

5d2ba53

Won't do this TODO

d1702b4

promote logging

003b218

Add test that impossibly high id is impossibly high

212049c

No TestLogging

3da03f7

Don't run all the tests

49a41d1

DaveCTurner removed the WIP label Dec 7, 2018

DaveCTurner added 2 commits December 7, 2018 22:34

Inline

af11d61

Only be lenient if the master changes

053d746

DaveCTurner merged commit 9f86e99 into elastic:master Dec 8, 2018

DaveCTurner deleted the 2018-11-19-rolling-upgrade-to-zen2 branch December 8, 2018 07:33

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Zen2] Support rolling upgrades from Zen1 #35737

[Zen2] Support rolling upgrades from Zen1 #35737

DaveCTurner commented Nov 20, 2018

elasticmachine commented Nov 20, 2018

DaveCTurner commented Nov 20, 2018

ywelsch left a comment

ywelsch Dec 6, 2018

ywelsch Dec 7, 2018

ywelsch Dec 6, 2018

DaveCTurner Dec 7, 2018

DaveCTurner Dec 7, 2018

DaveCTurner left a comment

ywelsch Dec 7, 2018

[Zen2] Support rolling upgrades from Zen1 #35737

[Zen2] Support rolling upgrades from Zen1 #35737

Conversation

DaveCTurner commented Nov 20, 2018

elasticmachine commented Nov 20, 2018

DaveCTurner commented Nov 20, 2018

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Dec 6, 2018

Choose a reason for hiding this comment

ywelsch Dec 7, 2018

Choose a reason for hiding this comment

ywelsch Dec 6, 2018

Choose a reason for hiding this comment

DaveCTurner Dec 7, 2018

Choose a reason for hiding this comment

DaveCTurner Dec 7, 2018

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

ywelsch Dec 7, 2018

Choose a reason for hiding this comment