-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Drop node if asymmetrically partitioned from master #39598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop node if asymmetrically partitioned from master #39598
Conversation
When a node is joining the cluster we ensure that it can send requests to the master _at that time_. If it joins the cluster and _then_ loses the ability to send requests to the master then it should be removed from the cluster. Today this is not the case: the master can still receive responses to its follower checks, and receives acknowledgements to cluster state publications, so has no reason to remove the node. This commit changes the handling of follower checks so that they fail if they come from a master that the other node was following but which it now believes to have failed.
Pinging @elastic/es-distributed |
@elasticmachine run elasticsearch-ci/bwc |
…ub.com:DaveCTurner/elasticsearch into 2019-03-02-remove-node-on-asymmetric-partition
}); | ||
boolean isJoinPending() { | ||
// cannot use pendingOutgoingJoins.isEmpty() because it's not properly synchronized. | ||
return pendingOutgoingJoins.iterator().hasNext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how using the iterator gives you any stronger guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ConcurrentHashMap
javadocs say:
* Bear in mind that the results of aggregate status methods including
* {@code size}, {@code isEmpty}, and {@code containsValue} are typically
* useful only when a map is not undergoing concurrent updates in other threads.
* Otherwise the results of these methods reflect transient states
* that may be adequate for monitoring or estimation purposes, but not
* for program control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the weakly consistent only guarantee given by ConcurrentHashMap
iteration means that following could happen:
pendingOutgoingJoins
has one entry,e
.- A thread
T1
callsisJoinPending
and gets the iterator (strictly speaking, we halt inside iterator construction beforeadvance
is called). - A thread
T2
adds another entry f and removes e (in that order). SopendingOutgoingJoins
was never empty. T1
continues andhasNext()
can now returnfalse
.
Mutex protection is done in Coordinator, but is not applied upon receiving requests and responses. It is difficult to see if above scenario can lead to issues, but a simpler solution could be to make the pendingOutgoingJoins
set synchronized instead (or maybe synchronize on the Coordinator.mutex when manipulating it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also from the Javadocs:
* Iterators, Spliterators and Enumerations return elements reflecting the
* state of the hash table at some point at or since the creation of the
* iterator/enumeration.
The at some point
in that sentence indicates a snapshot-like semantics which would forbid this situation, and also the consequences are benign as far as I can see, but I'm all for reducing unnecessary mental load. I opened #39900.
@@ -1560,6 +1596,14 @@ void setEmptySeedHostsList() { | |||
seedHostsList = emptyList(); | |||
} | |||
|
|||
void dropRequestsFrom(ClusterNode sender, ClusterNode destination) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps call this blackHoleRequestsFrom
?
When a node is joining the cluster we ensure that it can send requests to the master _at that time_. If it joins the cluster and _then_ loses the ability to send requests to the master then it should be removed from the cluster. Today this is not the case: the master can still receive responses to its follower checks, and receives acknowledgements to cluster state publications, so has no reason to remove the node. This commit changes the handling of follower checks so that they fail if they come from a master that the other node was following but which it now believes to have failed.
When a node is joining the cluster we ensure that it can send requests to the master _at that time_. If it joins the cluster and _then_ loses the ability to send requests to the master then it should be removed from the cluster. Today this is not the case: the master can still receive responses to its follower checks, and receives acknowledgements to cluster state publications, so has no reason to remove the node. This commit changes the handling of follower checks so that they fail if they come from a master that the other node was following but which it now believes to have failed.
When a node is joining the cluster we ensure that it can send requests to the
master at that time. If it joins the cluster and then loses the ability to
send requests to the master then it should be removed from the cluster. Today
this is not the case: the master can still receive responses to its follower
checks, and receives acknowledgements to cluster state publications, so has no
reason to remove the node.
This commit changes the handling of follower checks so that they fail if they
come from a master that the other node was following but which it now believes
to have failed.