Skip to content

Integrate LeaderChecker with Coordinator #34049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DaveCTurner
Copy link
Contributor

This change ensures that follower nodes periodically check that their leader is
healthy, and that they elect a new leader if not.

This change ensures that follower nodes periodically check that their leader is
healthy, and that they elect a new leader if not.
@DaveCTurner DaveCTurner added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Sep 25, 2018
@DaveCTurner DaveCTurner requested a review from ywelsch September 25, 2018 13:51
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@@ -94,7 +95,12 @@ public String toString() {
public void run() {
switch (getConnectionStatus(getLocalNode(), destination)) {
case BLACK_HOLE:
logger.trace("dropping {}", requestDescription);
if (action.equals(HANDSHAKE_ACTION_NAME)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me unhappy, but I was unable to come up with a better idea. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a7a76c0 which refactors the class a bit to make it more extensible. Let me know if you like that better.

@ywelsch ywelsch mentioned this pull request Sep 25, 2018
61 tasks
if (leaderCheckScheduler != null) {
leaderCheckScheduler.close();
}
leaderCheckScheduler = leaderChecker.startLeaderChecker(leaderNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Does this mean we restart the leader checker on every incoming publication? We call becomeFollower on every incoming publication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did mean a new leader checker on each publication. I pushed be15266 not to do this. However I can't think of a good way to reliably assert that we don't do this: both ways have pretty much the right liveness properties, and my other two ideas are:

  • check we don't send another leader check immediately after each publication (not robust)
  • check for reference equality of the leaderCheckScheduler object before/after a second publication (don't fancy exposing this).

Copy link
Contributor

@ywelsch ywelsch Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I can't think of a good way to reliably assert that we don't do this

I don't have any good idea here. I'll keep thinking about this. Should not block this PR though.

@@ -94,7 +95,12 @@ public String toString() {
public void run() {
switch (getConnectionStatus(getLocalNode(), destination)) {
case BLACK_HOLE:
logger.trace("dropping {}", requestDescription);
if (action.equals(HANDSHAKE_ACTION_NAME)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a7a76c0 which refactors the class a bit to make it more extensible. Let me know if you like that better.

@DaveCTurner
Copy link
Contributor Author

Thanks for a7a76c0 (for some reason I can't reply to this comment inline). I moved the log statement in 98a8236 but otherwise LGTM.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit d995fc8 into elastic:zen2 Sep 26, 2018
@DaveCTurner DaveCTurner deleted the 2018-09-25-integrate-leader-checker branch September 26, 2018 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants