Skip to content

[Zen2] Introduce gossip-like discovery of master nodes #32246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 77 commits into from
Aug 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
aaa8753
Introduce gossip-like discovery of master nodes
DaveCTurner Jul 20, 2018
321def8
No zen2 in request peers action
DaveCTurner Jul 31, 2018
8e9c881
Fix up discovery.find_peers_interval setting
DaveCTurner Jul 31, 2018
c870c94
Remove unused foundQuorumFrom
DaveCTurner Jul 31, 2018
686a434
protected
DaveCTurner Jul 31, 2018
83f7189
Inline one-use method
DaveCTurner Jul 31, 2018
1525607
We do not probe the local address so it does not need to be provided
DaveCTurner Jul 31, 2018
0b4e9d1
Fixup protected
DaveCTurner Jul 31, 2018
d524a62
Remove labelRunnable() and use an AbstractRunnable
DaveCTurner Jul 31, 2018
93ce793
Just created this, don't need another copy
DaveCTurner Jul 31, 2018
38ef542
Rename
DaveCTurner Jul 31, 2018
578a57f
Rename
DaveCTurner Jul 31, 2018
6ec315c
Not so private
DaveCTurner Jul 31, 2018
a933a7f
TODOs
DaveCTurner Jul 31, 2018
dddeecd
Renaming
DaveCTurner Jul 31, 2018
2177d0a
lastAcceptedNodes cannot change while the peerfinder is active, so ju…
DaveCTurner Aug 1, 2018
320ebdd
Do not need the local node to be in foundPeers, so remove it
DaveCTurner Aug 1, 2018
3712208
Rename discoveryNodes -> knownPeers in PeersRequest
DaveCTurner Aug 1, 2018
a566ffc
Rename candidateNodes to knownPeers in PeersResponse too
DaveCTurner Aug 1, 2018
5f12c03
Imports
DaveCTurner Aug 1, 2018
dcfa1ca
Private class
DaveCTurner Aug 1, 2018
53c1dff
Private
DaveCTurner Aug 1, 2018
0120029
Add assertion of received term
DaveCTurner Aug 1, 2018
3fab139
Start work on having the PeerFinder respond to PeersRequests directly
DaveCTurner Aug 1, 2018
4701d35
Fix assertion
DaveCTurner Aug 2, 2018
4b70b2e
Add test case for values in PeersResponse
DaveCTurner Aug 2, 2018
ffe6637
Test that it does delegate to onPeersRequestWhenInactive if inactive
DaveCTurner Aug 2, 2018
b219998
Also verify that it receives messages from the transport service
DaveCTurner Aug 2, 2018
52834a1
TODO resolved
DaveCTurner Aug 2, 2018
354c94e
Imports
DaveCTurner Aug 2, 2018
94cf9a3
Remove foundPeers and just track the nodes we've sent requests to
DaveCTurner Aug 2, 2018
c7bcae9
Prevent multiple concurrent attempts to connect to the same node
DaveCTurner Aug 2, 2018
7d2a5a6
Merge branch 'zen2' into 2018-07-20-peerfinder
DaveCTurner Aug 2, 2018
9e156df
Line length
DaveCTurner Aug 2, 2018
c805add
Remove unused class
DaveCTurner Aug 3, 2018
7aec464
Remove refs to probeLock
DaveCTurner Aug 3, 2018
33b17ce
connectTo is already async, no need to schedule it with the executorS…
DaveCTurner Aug 3, 2018
b00795c
Consolidate per-address logic into Peer class
DaveCTurner Aug 3, 2018
0e9850e
No need to track separate state, just need a boolean
DaveCTurner Aug 3, 2018
f466b44
Tweaks
DaveCTurner Aug 3, 2018
3c219de
Need moar coffee
DaveCTurner Aug 3, 2018
9356a26
Pass in leader when deactivating
DaveCTurner Aug 4, 2018
7afe75c
Remove comments re. synchronisation
DaveCTurner Aug 4, 2018
27e95cf
Rename callback
DaveCTurner Aug 4, 2018
5374f2b
Remove foundMasterNode boolean
DaveCTurner Aug 4, 2018
1308444
Assert discoveryNode not set
DaveCTurner Aug 4, 2018
b7c598b
Get the remote node again rather than passing it in
DaveCTurner Aug 4, 2018
417e6e2
Unwrap provided addresses
DaveCTurner Aug 4, 2018
4ca2e62
Extract ConfiguredHostsResolver class
DaveCTurner Aug 4, 2018
f034f18
Combine ActivePeerFinder and PeerFinder
DaveCTurner Aug 4, 2018
380d930
No need to expose isActive - this test is not helpful
DaveCTurner Aug 4, 2018
34b6fd4
Assert lifecycle is started
DaveCTurner Aug 4, 2018
98cd79a
No need for PeerFinder to be responsible for lifecycle of ConfiguredH…
DaveCTurner Aug 4, 2018
e3f9638
Add another test with failing address resolution
DaveCTurner Aug 4, 2018
253e2ea
Delete unused class
DaveCTurner Aug 4, 2018
5d97de7
Fix log message
DaveCTurner Aug 4, 2018
fb171e7
Assert no known peers as soon as deactivated
DaveCTurner Aug 4, 2018
8126e86
Private
DaveCTurner Aug 5, 2018
01e230f
Inline and rename
DaveCTurner Aug 5, 2018
8e3b175
Use AbstractRunnable and force execution
DaveCTurner Aug 5, 2018
4c52e50
Nullable
DaveCTurner Aug 6, 2018
121aad3
Rename and streamify
DaveCTurner Aug 6, 2018
c07c7d4
Reorder method
DaveCTurner Aug 6, 2018
1d4adcc
Rename request peers action
DaveCTurner Aug 6, 2018
f0e7155
Safe to wake up peers even if already deactivated
DaveCTurner Aug 6, 2018
c19b3df
Imports
DaveCTurner Aug 6, 2018
2051db6
Fix log message
DaveCTurner Aug 6, 2018
221253a
Oneliners
DaveCTurner Aug 6, 2018
e5414a1
Add discoveryNode to PeersFinder.Peer.toString() and remove from log …
DaveCTurner Aug 6, 2018
3608505
Remove TODO
DaveCTurner Aug 6, 2018
2013f10
Move PeerFinder machinery to discovery package
DaveCTurner Aug 6, 2018
8b605ad
Move ConfiguredHostsResolver interface into PeerFinder
DaveCTurner Aug 6, 2018
9aa9003
Logger usage
DaveCTurner Aug 6, 2018
b517ce9
Whitespace
DaveCTurner Aug 6, 2018
253a994
No need to refer to class name in log messages
DaveCTurner Aug 6, 2018
7948268
Can only deactivate an active PeerFinder
DaveCTurner Aug 6, 2018
b4df5f3
Revert "Can only deactivate an active PeerFinder"
DaveCTurner Aug 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.cluster.coordination;

import org.elasticsearch.cluster.node.DiscoveryNode;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.transport.TransportResponse;

import java.io.IOException;
import java.util.List;
import java.util.Objects;
import java.util.Optional;

public class PeersResponse extends TransportResponse {
private final Optional<DiscoveryNode> masterNode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be nicer to have this as a boolean (isActiveMaster) stating whether the current node providing the peer response is an active master node, and have the list of discoveryNodes just called masterNodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is now, if you contact a follower the next step is just to contact the follower's leader. With a flag you'd go ahead and contact all the follower's peers before discovering that one of them is the leader.

In a properly-configured cluster it doesn't seem to make much difference, but if badly configured (e.g. the leader is not in all the unicast hosts lists, and there are lots of master-eligible nodes) then there's less traffic this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow (no pun intended). A follower would return isActiveMaster = false + the active master as singleton in the list of masterEligibleNodes, which would lead to exact same behavior as what's currently in the PR if I understand this correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed on Zoom. There's no strong argument either way. I slightly prefer that as it is now there's a difference in the responses between a follower and a candidate that knows of a single node, although we don't use this difference anywhere. I also see that moving to a single list (a singleton from followers) or a boolean would remove a couple of lines of code that deal with the master node as a special case. We'll leave it as is.

private final List<DiscoveryNode> knownPeers;
private final long term;

public PeersResponse(Optional<DiscoveryNode> masterNode, List<DiscoveryNode> knownPeers, long term) {
assert masterNode.isPresent() == false || knownPeers.isEmpty();
this.masterNode = masterNode;
this.knownPeers = knownPeers;
this.term = term;
}

public PeersResponse(StreamInput in) throws IOException {
masterNode = Optional.ofNullable(in.readOptionalWriteable(DiscoveryNode::new));
knownPeers = in.readList(DiscoveryNode::new);
term = in.readLong();
assert masterNode.isPresent() == false || knownPeers.isEmpty();
}

@Override
public void writeTo(StreamOutput out) throws IOException {
super.writeTo(out);
out.writeOptionalWriteable(masterNode.orElse(null));
out.writeList(knownPeers);
out.writeLong(term);
}

/**
* @return the node that is currently leading, according to the responding node.
*/
public Optional<DiscoveryNode> getMasterNode() {
return masterNode;
}

/**
* @return the collection of known peers of the responding node, or an empty collection if the responding node believes there
* is currently a leader.
*/
public List<DiscoveryNode> getKnownPeers() {
return knownPeers;
}

/**
* @return the current term of the responding node. If the responding node is the leader then this is the term in which it is
* currently leading.
*/
public long getTerm() {
return term;
}

@Override
public String toString() {
return "PeersResponse{" +
"masterNode=" + masterNode +
", knownPeers=" + knownPeers +
", term=" + term +
'}';
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
PeersResponse that = (PeersResponse) o;
return term == that.term &&
Objects.equals(masterNode, that.masterNode) &&
Objects.equals(knownPeers, that.knownPeers);
}

@Override
public int hashCode() {
return Objects.hash(masterNode, knownPeers, term);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
import org.elasticsearch.common.util.concurrent.ThreadContext;
import org.elasticsearch.discovery.DiscoveryModule;
import org.elasticsearch.discovery.DiscoverySettings;
import org.elasticsearch.discovery.PeerFinder;
import org.elasticsearch.discovery.zen.ElectMasterService;
import org.elasticsearch.discovery.zen.FaultDetection;
import org.elasticsearch.discovery.zen.SettingsBasedHostsProvider;
Expand Down Expand Up @@ -423,6 +424,7 @@ public void apply(Settings value, Settings current, Settings previous) {
Node.BREAKER_TYPE_KEY,
OperationRouting.USE_ADAPTIVE_REPLICA_SELECTION_SETTING,
IndexGraveyard.SETTING_MAX_TOMBSTONES,
EnableAssignmentDecider.CLUSTER_TASKS_ALLOCATION_ENABLE_SETTING
EnableAssignmentDecider.CLUSTER_TASKS_ALLOCATION_ENABLE_SETTING,
PeerFinder.DISCOVERY_FIND_PEERS_INTERVAL_SETTING
)));
}
Loading