Skip to content

[Zen2] Support rolling upgrades from Zen1 #35737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DaveCTurner
Copy link
Contributor

We support rolling upgrades from Zen1 by keeping the master as a Zen1 node
until there are no more Zen1 nodes in the cluster, using the following
principles:

  • Zen1 nodes will never vote for Zen2 nodes
  • Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes
  • Zen2 nodes that were previously part of a mixed cluster will automatically
    (and unsafely) bootstrap themselves when the last Zen1 node leaves.

We support rolling upgrades from Zen1 by keeping the master as a Zen1 node
until there are no more Zen1 nodes in the cluster, using the following
principles:

- Zen1 nodes will never vote for Zen2 nodes
- Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes
- Zen2 nodes that were previously part of a mixed cluster will automatically
  (and unsafely) bootstrap themselves when the last Zen1 node leaves.
@DaveCTurner DaveCTurner added >enhancement WIP v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 20, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch November 20, 2018 13:37
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Contributor Author

This is WIP, lacking in any tests apart from the obvious one (which works 🎉) but I would like an initial opinion on the approach before I go too far down the wrong path.

@ywelsch ywelsch mentioned this pull request Nov 20, 2018
61 tasks
@elastic elastic deleted a comment Nov 20, 2018
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I've added some questions and ideas.

getLastAcceptedVersion(), clusterState.version());
throw new CoordinationStateRejectedException("incoming version " + clusterState.version() +
" lower or equal to current version " + getLastAcceptedVersion());
if (clusterState.term() == ZEN1_BWC_TERM) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can somehow avoid putting this extra condition in this class, perhaps by creating fresh persistedstate / coordinationstate instances. Also I wonder if we should enforce the version semantics as long as the states are coming from the same master (i.e. same ephemeral id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection, I think that creating a fresh persistedstate / coordinationstate is more cumbersome as the alternative. Still would be good to enforce the version semantics here if the cluster states are from the same master (i.e. same ephemeral id).

@@ -1939,7 +1940,8 @@ public synchronized String startNode(Settings settings) {
}
final List<NodeAndClient> nodes = new ArrayList<>();
final int prevMasterCount = getMasterNodesCount();
int bootstrapMasterNodeIndex = prevMasterCount == 0 && autoManageMinMasterNodes && newMasterCount > 0
int bootstrapMasterNodeIndex = prevMasterCount == 0 && autoManageMinMasterNodes && newMasterCount > 0 && Arrays.stream(settings)
.allMatch(s -> Node.NODE_MASTER_SETTING.get(s) == false || TestZenDiscovery.USE_ZEN2.get(s) == true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run Zen1IT with autoManageMinMasterNodes disabled? It's doing odd stuff, e.g. when restarting a node in a 2 node cluster, it changes min_master_nodes to 1 and triggers the Zen2 node to bootstrap all by itself, although it probably shouldn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does the right thing with min_master_nodes. I didn't think it bootstrapped the Zen2 node since there's already a master node present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think 053d746 is sufficient.

Copy link
Contributor Author

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some responses

getLastAcceptedVersion(), clusterState.version());
throw new CoordinationStateRejectedException("incoming version " + clusterState.version() +
" lower or equal to current version " + getLastAcceptedVersion());
if (clusterState.term() == ZEN1_BWC_TERM) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection, I think that creating a fresh persistedstate / coordinationstate is more cumbersome as the alternative. Still would be good to enforce the version semantics here if the cluster states are from the same master (i.e. same ephemeral id).

@DaveCTurner DaveCTurner removed the WIP label Dec 7, 2018
@DaveCTurner DaveCTurner merged commit 9f86e99 into elastic:master Dec 8, 2018
@DaveCTurner DaveCTurner deleted the 2018-11-19-rolling-upgrade-to-zen2 branch December 8, 2018 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants