-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[Zen2] Support rolling upgrades from Zen1 #35737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Zen2] Support rolling upgrades from Zen1 #35737
Conversation
We support rolling upgrades from Zen1 by keeping the master as a Zen1 node until there are no more Zen1 nodes in the cluster, using the following principles: - Zen1 nodes will never vote for Zen2 nodes - Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes - Zen2 nodes that were previously part of a mixed cluster will automatically (and unsafely) bootstrap themselves when the last Zen1 node leaves.
Pinging @elastic/es-distributed |
This is WIP, lacking in any tests apart from the obvious one (which works 🎉) but I would like an initial opinion on the approach before I go too far down the wrong path. |
server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java
Outdated
Show resolved
Hide resolved
This reverts commit ae2b7fb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I've added some questions and ideas.
getLastAcceptedVersion(), clusterState.version()); | ||
throw new CoordinationStateRejectedException("incoming version " + clusterState.version() + | ||
" lower or equal to current version " + getLastAcceptedVersion()); | ||
if (clusterState.term() == ZEN1_BWC_TERM) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can somehow avoid putting this extra condition in this class, perhaps by creating fresh persistedstate / coordinationstate instances. Also I wonder if we should enforce the version semantics as long as the states are coming from the same master (i.e. same ephemeral id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, I think that creating a fresh persistedstate / coordinationstate is more cumbersome as the alternative. Still would be good to enforce the version semantics here if the cluster states are from the same master (i.e. same ephemeral id).
server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/DiscoveryUpgradeService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/DiscoveryUpgradeService.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/coordination/Zen1IT.java
Outdated
Show resolved
Hide resolved
@@ -1939,7 +1940,8 @@ public synchronized String startNode(Settings settings) { | |||
} | |||
final List<NodeAndClient> nodes = new ArrayList<>(); | |||
final int prevMasterCount = getMasterNodesCount(); | |||
int bootstrapMasterNodeIndex = prevMasterCount == 0 && autoManageMinMasterNodes && newMasterCount > 0 | |||
int bootstrapMasterNodeIndex = prevMasterCount == 0 && autoManageMinMasterNodes && newMasterCount > 0 && Arrays.stream(settings) | |||
.allMatch(s -> Node.NODE_MASTER_SETTING.get(s) == false || TestZenDiscovery.USE_ZEN2.get(s) == true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we run Zen1IT with autoManageMinMasterNodes
disabled? It's doing odd stuff, e.g. when restarting a node in a 2 node cluster, it changes min_master_nodes to 1 and triggers the Zen2 node to bootstrap all by itself, although it probably shouldn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does the right thing with min_master_nodes
. I didn't think it bootstrapped the Zen2 node since there's already a master node present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think 053d746 is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some responses
server/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/DiscoveryUpgradeService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/discovery/PeerFinder.java
Outdated
Show resolved
Hide resolved
getLastAcceptedVersion(), clusterState.version()); | ||
throw new CoordinationStateRejectedException("incoming version " + clusterState.version() + | ||
" lower or equal to current version " + getLastAcceptedVersion()); | ||
if (clusterState.term() == ZEN1_BWC_TERM) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, I think that creating a fresh persistedstate / coordinationstate is more cumbersome as the alternative. Still would be good to enforce the version semantics here if the cluster states are from the same master (i.e. same ephemeral id).
We support rolling upgrades from Zen1 by keeping the master as a Zen1 node
until there are no more Zen1 nodes in the cluster, using the following
principles:
(and unsafely) bootstrap themselves when the last Zen1 node leaves.