-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[Zen2] Introduce FollowersChecker #33917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Zen2] Introduce FollowersChecker #33917
Conversation
It is important that the leader periodically checks that its followers are still healthy and can remain part of its cluster. If these checks fail repeatedly then the leader should remove the faulty node from the cluster. The FollowerChecker, introduced in this commit, performs these periodic checks and deals with retries.
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few smaller comments and one item I would like to discuss.
server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java
Outdated
Show resolved
Hide resolved
request, this).getFormattedMessage()); | ||
} | ||
|
||
executeRunnable.accept(new AbstractRunnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add assert request.term > this.term
before this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't hold - we could be in the correct term but a CANDIDATE
(e.g. LeaderChecker
thinks the leader failed) and then we need to call into the coordinator and become a FOLLOWER
again.
server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/coordination/FollowersCheckerTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/FollowersChecker.java
Show resolved
Hide resolved
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Allows this class to be cleanly shared between Zen1 and Zen2. Follow-up to #33917
It is important that the leader periodically checks that its followers are
still healthy and can remain part of its cluster. If these checks fail
repeatedly then the leader should remove the faulty node from the cluster. The
FollowerChecker, introduced in this commit, performs these periodic checks and
deals with retries.