Skip to content

Drop node if asymmetrically partitioned from master #39598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DaveCTurner
Copy link
Contributor

When a node is joining the cluster we ensure that it can send requests to the
master at that time. If it joins the cluster and then loses the ability to
send requests to the master then it should be removed from the cluster. Today
this is not the case: the master can still receive responses to its follower
checks, and receives acknowledgements to cluster state publications, so has no
reason to remove the node.

This commit changes the handling of follower checks so that they fail if they
come from a master that the other node was following but which it now believes
to have failed.

When a node is joining the cluster we ensure that it can send requests to the
master _at that time_. If it joins the cluster and _then_ loses the ability to
send requests to the master then it should be removed from the cluster. Today
this is not the case: the master can still receive responses to its follower
checks, and receives acknowledgements to cluster state publications, so has no
reason to remove the node.

This commit changes the handling of follower checks so that they fail if they
come from a master that the other node was following but which it now believes
to have failed.
@DaveCTurner DaveCTurner added >bug v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.2.0 labels Mar 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

});
boolean isJoinPending() {
// cannot use pendingOutgoingJoins.isEmpty() because it's not properly synchronized.
return pendingOutgoingJoins.iterator().hasNext();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how using the iterator gives you any stronger guarantees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ConcurrentHashMap javadocs say:

 * Bear in mind that the results of aggregate status methods including
 * {@code size}, {@code isEmpty}, and {@code containsValue} are typically
 * useful only when a map is not undergoing concurrent updates in other threads.
 * Otherwise the results of these methods reflect transient states
 * that may be adequate for monitoring or estimation purposes, but not
 * for program control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the weakly consistent only guarantee given by ConcurrentHashMap iteration means that following could happen:

  1. pendingOutgoingJoins has one entry, e.
  2. A thread T1 calls isJoinPending and gets the iterator (strictly speaking, we halt inside iterator construction before advance is called).
  3. A thread T2 adds another entry f and removes e (in that order). So pendingOutgoingJoins was never empty.
  4. T1 continues and hasNext() can now return false.

Mutex protection is done in Coordinator, but is not applied upon receiving requests and responses. It is difficult to see if above scenario can lead to issues, but a simpler solution could be to make the pendingOutgoingJoins set synchronized instead (or maybe synchronize on the Coordinator.mutex when manipulating it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also from the Javadocs:

 * Iterators, Spliterators and Enumerations return elements reflecting the
 * state of the hash table at some point at or since the creation of the
 * iterator/enumeration.

The at some point in that sentence indicates a snapshot-like semantics which would forbid this situation, and also the consequences are benign as far as I can see, but I'm all for reducing unnecessary mental load. I opened #39900.

@@ -1560,6 +1596,14 @@ void setEmptySeedHostsList() {
seedHostsList = emptyList();
}

void dropRequestsFrom(ClusterNode sender, ClusterNode destination) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps call this blackHoleRequestsFrom?

@DaveCTurner DaveCTurner merged commit 2d28d7b into elastic:master Mar 6, 2019
@DaveCTurner DaveCTurner deleted the 2019-03-02-remove-node-on-asymmetric-partition branch March 6, 2019 09:23
DaveCTurner added a commit that referenced this pull request Mar 6, 2019
When a node is joining the cluster we ensure that it can send requests to the
master _at that time_. If it joins the cluster and _then_ loses the ability to
send requests to the master then it should be removed from the cluster. Today
this is not the case: the master can still receive responses to its follower
checks, and receives acknowledgements to cluster state publications, so has no
reason to remove the node.

This commit changes the handling of follower checks so that they fail if they
come from a master that the other node was following but which it now believes
to have failed.
DaveCTurner added a commit that referenced this pull request Mar 6, 2019
When a node is joining the cluster we ensure that it can send requests to the
master _at that time_. If it joins the cluster and _then_ loses the ability to
send requests to the master then it should be removed from the cluster. Today
this is not the case: the master can still receive responses to its follower
checks, and receives acknowledgements to cluster state publications, so has no
reason to remove the node.

This commit changes the handling of follower checks so that they fail if they
come from a master that the other node was following but which it now believes
to have failed.
@andrershov andrershov removed their request for review March 6, 2019 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v7.0.0-rc2 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants