-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[Zen2] Introduce gossip-like discovery of master nodes #32246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
77 commits
Select commit
Hold shift + click to select a range
aaa8753
Introduce gossip-like discovery of master nodes
DaveCTurner 321def8
No zen2 in request peers action
DaveCTurner 8e9c881
Fix up discovery.find_peers_interval setting
DaveCTurner c870c94
Remove unused foundQuorumFrom
DaveCTurner 686a434
protected
DaveCTurner 83f7189
Inline one-use method
DaveCTurner 1525607
We do not probe the local address so it does not need to be provided
DaveCTurner 0b4e9d1
Fixup protected
DaveCTurner d524a62
Remove labelRunnable() and use an AbstractRunnable
DaveCTurner 93ce793
Just created this, don't need another copy
DaveCTurner 38ef542
Rename
DaveCTurner 578a57f
Rename
DaveCTurner 6ec315c
Not so private
DaveCTurner a933a7f
TODOs
DaveCTurner dddeecd
Renaming
DaveCTurner 2177d0a
lastAcceptedNodes cannot change while the peerfinder is active, so ju…
DaveCTurner 320ebdd
Do not need the local node to be in foundPeers, so remove it
DaveCTurner 3712208
Rename discoveryNodes -> knownPeers in PeersRequest
DaveCTurner a566ffc
Rename candidateNodes to knownPeers in PeersResponse too
DaveCTurner 5f12c03
Imports
DaveCTurner dcfa1ca
Private class
DaveCTurner 53c1dff
Private
DaveCTurner 0120029
Add assertion of received term
DaveCTurner 3fab139
Start work on having the PeerFinder respond to PeersRequests directly
DaveCTurner 4701d35
Fix assertion
DaveCTurner 4b70b2e
Add test case for values in PeersResponse
DaveCTurner ffe6637
Test that it does delegate to onPeersRequestWhenInactive if inactive
DaveCTurner b219998
Also verify that it receives messages from the transport service
DaveCTurner 52834a1
TODO resolved
DaveCTurner 354c94e
Imports
DaveCTurner 94cf9a3
Remove foundPeers and just track the nodes we've sent requests to
DaveCTurner c7bcae9
Prevent multiple concurrent attempts to connect to the same node
DaveCTurner 7d2a5a6
Merge branch 'zen2' into 2018-07-20-peerfinder
DaveCTurner 9e156df
Line length
DaveCTurner c805add
Remove unused class
DaveCTurner 7aec464
Remove refs to probeLock
DaveCTurner 33b17ce
connectTo is already async, no need to schedule it with the executorS…
DaveCTurner b00795c
Consolidate per-address logic into Peer class
DaveCTurner 0e9850e
No need to track separate state, just need a boolean
DaveCTurner f466b44
Tweaks
DaveCTurner 3c219de
Need moar coffee
DaveCTurner 9356a26
Pass in leader when deactivating
DaveCTurner 7afe75c
Remove comments re. synchronisation
DaveCTurner 27e95cf
Rename callback
DaveCTurner 5374f2b
Remove foundMasterNode boolean
DaveCTurner 1308444
Assert discoveryNode not set
DaveCTurner b7c598b
Get the remote node again rather than passing it in
DaveCTurner 417e6e2
Unwrap provided addresses
DaveCTurner 4ca2e62
Extract ConfiguredHostsResolver class
DaveCTurner f034f18
Combine ActivePeerFinder and PeerFinder
DaveCTurner 380d930
No need to expose isActive - this test is not helpful
DaveCTurner 34b6fd4
Assert lifecycle is started
DaveCTurner 98cd79a
No need for PeerFinder to be responsible for lifecycle of ConfiguredH…
DaveCTurner e3f9638
Add another test with failing address resolution
DaveCTurner 253e2ea
Delete unused class
DaveCTurner 5d97de7
Fix log message
DaveCTurner fb171e7
Assert no known peers as soon as deactivated
DaveCTurner 8126e86
Private
DaveCTurner 01e230f
Inline and rename
DaveCTurner 8e3b175
Use AbstractRunnable and force execution
DaveCTurner 4c52e50
Nullable
DaveCTurner 121aad3
Rename and streamify
DaveCTurner c07c7d4
Reorder method
DaveCTurner 1d4adcc
Rename request peers action
DaveCTurner f0e7155
Safe to wake up peers even if already deactivated
DaveCTurner c19b3df
Imports
DaveCTurner 2051db6
Fix log message
DaveCTurner 221253a
Oneliners
DaveCTurner e5414a1
Add discoveryNode to PeersFinder.Peer.toString() and remove from log …
DaveCTurner 3608505
Remove TODO
DaveCTurner 2013f10
Move PeerFinder machinery to discovery package
DaveCTurner 8b605ad
Move ConfiguredHostsResolver interface into PeerFinder
DaveCTurner 9aa9003
Logger usage
DaveCTurner b517ce9
Whitespace
DaveCTurner 253a994
No need to refer to class name in log messages
DaveCTurner 7948268
Can only deactivate an active PeerFinder
DaveCTurner b4df5f3
Revert "Can only deactivate an active PeerFinder"
DaveCTurner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
105 changes: 105 additions & 0 deletions
105
server/src/main/java/org/elasticsearch/cluster/coordination/PeersResponse.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.cluster.coordination; | ||
|
||
import org.elasticsearch.cluster.node.DiscoveryNode; | ||
import org.elasticsearch.common.io.stream.StreamInput; | ||
import org.elasticsearch.common.io.stream.StreamOutput; | ||
import org.elasticsearch.transport.TransportResponse; | ||
|
||
import java.io.IOException; | ||
import java.util.List; | ||
import java.util.Objects; | ||
import java.util.Optional; | ||
|
||
public class PeersResponse extends TransportResponse { | ||
private final Optional<DiscoveryNode> masterNode; | ||
private final List<DiscoveryNode> knownPeers; | ||
private final long term; | ||
|
||
public PeersResponse(Optional<DiscoveryNode> masterNode, List<DiscoveryNode> knownPeers, long term) { | ||
assert masterNode.isPresent() == false || knownPeers.isEmpty(); | ||
this.masterNode = masterNode; | ||
this.knownPeers = knownPeers; | ||
this.term = term; | ||
} | ||
|
||
public PeersResponse(StreamInput in) throws IOException { | ||
masterNode = Optional.ofNullable(in.readOptionalWriteable(DiscoveryNode::new)); | ||
knownPeers = in.readList(DiscoveryNode::new); | ||
term = in.readLong(); | ||
assert masterNode.isPresent() == false || knownPeers.isEmpty(); | ||
} | ||
|
||
@Override | ||
public void writeTo(StreamOutput out) throws IOException { | ||
super.writeTo(out); | ||
out.writeOptionalWriteable(masterNode.orElse(null)); | ||
out.writeList(knownPeers); | ||
out.writeLong(term); | ||
} | ||
|
||
/** | ||
* @return the node that is currently leading, according to the responding node. | ||
*/ | ||
public Optional<DiscoveryNode> getMasterNode() { | ||
return masterNode; | ||
} | ||
|
||
/** | ||
* @return the collection of known peers of the responding node, or an empty collection if the responding node believes there | ||
* is currently a leader. | ||
*/ | ||
public List<DiscoveryNode> getKnownPeers() { | ||
return knownPeers; | ||
} | ||
|
||
/** | ||
* @return the current term of the responding node. If the responding node is the leader then this is the term in which it is | ||
* currently leading. | ||
*/ | ||
public long getTerm() { | ||
return term; | ||
} | ||
|
||
@Override | ||
public String toString() { | ||
return "PeersResponse{" + | ||
"masterNode=" + masterNode + | ||
", knownPeers=" + knownPeers + | ||
", term=" + term + | ||
'}'; | ||
} | ||
|
||
@Override | ||
public boolean equals(Object o) { | ||
if (this == o) return true; | ||
if (o == null || getClass() != o.getClass()) return false; | ||
PeersResponse that = (PeersResponse) o; | ||
return term == that.term && | ||
Objects.equals(masterNode, that.masterNode) && | ||
Objects.equals(knownPeers, that.knownPeers); | ||
} | ||
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(masterNode, knownPeers, term); | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be nicer to have this as a boolean (
isActiveMaster
) stating whether the current node providing the peer response is an active master node, and have the list of discoveryNodes just calledmasterNodes
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is now, if you contact a follower the next step is just to contact the follower's leader. With a flag you'd go ahead and contact all the follower's peers before discovering that one of them is the leader.
In a properly-configured cluster it doesn't seem to make much difference, but if badly configured (e.g. the leader is not in all the unicast hosts lists, and there are lots of master-eligible nodes) then there's less traffic this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I follow (no pun intended). A follower would return isActiveMaster = false + the active master as singleton in the list of masterEligibleNodes, which would lead to exact same behavior as what's currently in the PR if I understand this correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed on Zoom. There's no strong argument either way. I slightly prefer that as it is now there's a difference in the responses between a follower and a candidate that knows of a single node, although we don't use this difference anywhere. I also see that moving to a single list (a singleton from followers) or a boolean would remove a couple of lines of code that deal with the master node as a special case. We'll leave it as is.