-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Integrate LeaderChecker with Coordinator #34049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate LeaderChecker with Coordinator #34049
Conversation
This change ensures that follower nodes periodically check that their leader is healthy, and that they elect a new leader if not.
Pinging @elastic/es-distributed |
@@ -94,7 +95,12 @@ public String toString() { | |||
public void run() { | |||
switch (getConnectionStatus(getLocalNode(), destination)) { | |||
case BLACK_HOLE: | |||
logger.trace("dropping {}", requestDescription); | |||
if (action.equals(HANDSHAKE_ACTION_NAME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me unhappy, but I was unable to come up with a better idea. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a7a76c0 which refactors the class a bit to make it more extensible. Let me know if you like that better.
server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java
Outdated
Show resolved
Hide resolved
if (leaderCheckScheduler != null) { | ||
leaderCheckScheduler.close(); | ||
} | ||
leaderCheckScheduler = leaderChecker.startLeaderChecker(leaderNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. Does this mean we restart the leader checker on every incoming publication? We call becomeFollower on every incoming publication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did mean a new leader checker on each publication. I pushed be15266 not to do this. However I can't think of a good way to reliably assert that we don't do this: both ways have pretty much the right liveness properties, and my other two ideas are:
- check we don't send another leader check immediately after each publication (not robust)
- check for reference equality of the
leaderCheckScheduler
object before/after a second publication (don't fancy exposing this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However I can't think of a good way to reliably assert that we don't do this
I don't have any good idea here. I'll keep thinking about this. Should not block this PR though.
server/src/test/java/org/elasticsearch/cluster/coordination/CoordinatorTests.java
Outdated
Show resolved
Hide resolved
@@ -94,7 +95,12 @@ public String toString() { | |||
public void run() { | |||
switch (getConnectionStatus(getLocalNode(), destination)) { | |||
case BLACK_HOLE: | |||
logger.trace("dropping {}", requestDescription); | |||
if (action.equals(HANDSHAKE_ACTION_NAME)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a7a76c0 which refactors the class a bit to make it more extensible. Let me know if you like that better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change ensures that follower nodes periodically check that their leader is
healthy, and that they elect a new leader if not.