Skip to content

Commit 2049f71

Browse files
ywelschDaveCTurner
andcommitted
Add voting-only master node (#43410)
A voting-only master-eligible node is a node that can participate in master elections but will not act as a master in the cluster. In particular, a voting-only node can help elect another master-eligible node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two can still elect a master amongst them-selves. This only requires one of the two remaining nodes to have the capability to act as master, but both need to have voting powers. This means that one of the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a voting-only non-dedicated master node can play the role of the third master-eligible node, which allows running an HA cluster with only two dedicated master nodes. Closes #14340 Co-authored-by: David Turner <[email protected]>
1 parent 11f41c4 commit 2049f71

File tree

41 files changed

+2533
-1576
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+2533
-1576
lines changed

A

Whitespace-only changes.

docs/reference/cluster.asciidoc

+12-6
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,14 @@ one of the following:
2222
* an IP address or hostname, to add all matching nodes to the subset.
2323
* a pattern, using `*` wildcards, which adds all nodes to the subset
2424
whose name, address or hostname matches the pattern.
25-
* `master:true`, `data:true`, `ingest:true` or `coordinating_only:true`, which
26-
respectively add to the subset all master-eligible nodes, all data nodes,
27-
all ingest nodes, and all coordinating-only nodes.
28-
* `master:false`, `data:false`, `ingest:false` or `coordinating_only:false`,
29-
which respectively remove from the subset all master-eligible nodes, all data
30-
nodes, all ingest nodes, and all coordinating-only nodes.
25+
* `master:true`, `data:true`, `ingest:true`, `voting_only:true` or
26+
`coordinating_only:true`, which respectively add to the subset all
27+
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
28+
nodes, and all coordinating-only nodes.
29+
* `master:false`, `data:false`, `ingest:false`, `voting_only:true`, or
30+
`coordinating_only:false`, which respectively remove from the subset all
31+
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
32+
nodes and all coordinating-only nodes.
3133
* a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
3234
which adds to the subset all nodes with a custom node attribute whose name
3335
and value match the respective patterns. Custom node attributes are
@@ -46,6 +48,9 @@ means that filters such as `master:false` which remove nodes from the chosen
4648
subset are only useful if they come after some other filters. When used on its
4749
own, `master:false` selects no nodes.
4850

51+
NOTE: The `voting_only` role requires the {default-dist} of Elasticsearch and
52+
is not supported in the {oss-dist}.
53+
4954
Here are some examples of the use of node filters with the
5055
<<cluster-nodes-info,Nodes Info>> APIs.
5156

@@ -69,6 +74,7 @@ GET /_nodes/10.0.0.*
6974
GET /_nodes/_all,master:false
7075
GET /_nodes/data:true,ingest:true
7176
GET /_nodes/coordinating_only:true
77+
GET /_nodes/master:true,voting_only:false
7278
# Select nodes by custom attribute (e.g. with something like `node.attr.rack: 2` in the configuration file)
7379
GET /_nodes/rack:2
7480
GET /_nodes/ra*:2

docs/reference/cluster/stats.asciidoc

+7-2
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,8 @@ Will return, for example:
109109
"data": 1,
110110
"coordinating_only": 0,
111111
"master": 1,
112-
"ingest": 1
112+
"ingest": 1,
113+
"voting_only": 0
113114
},
114115
"versions": [
115116
"{version}"
@@ -207,6 +208,7 @@ Will return, for example:
207208
// TESTRESPONSE[s/"plugins": \[[^\]]*\]/"plugins": $body.$_path/]
208209
// TESTRESPONSE[s/"network_types": \{[^\}]*\}/"network_types": $body.$_path/]
209210
// TESTRESPONSE[s/"discovery_types": \{[^\}]*\}/"discovery_types": $body.$_path/]
211+
// TESTRESPONSE[s/"count": \{[^\}]*\}/"count": $body.$_path/]
210212
// TESTRESPONSE[s/"packaging_types": \[[^\]]*\]/"packaging_types": $body.$_path/]
211213
// TESTRESPONSE[s/: true|false/: $body.$_path/]
212214
// TESTRESPONSE[s/: (\-)?[0-9]+/: $body.$_path/]
@@ -217,7 +219,10 @@ Will return, for example:
217219
// see an exhaustive list anyway.
218220
// 2. Similarly, ignore the contents of `network_types`, `discovery_types`, and
219221
// `packaging_types`.
220-
// 3. All of the numbers and strings on the right hand side of *every* field in
222+
// 3. Ignore the contents of the (nodes) count object, as what's shown here
223+
// depends on the license. Voting-only nodes are e.g. only shown when this
224+
// test runs with a basic license.
225+
// 4. All of the numbers and strings on the right hand side of *every* field in
221226
// the response are ignored. So we're really only asserting things about the
222227
// the shape of this response, not the values in it.
223228

docs/reference/modules/node.asciidoc

+44-2
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,9 @@ creating or deleting an index, tracking which nodes are part of the cluster,
8585
and deciding which shards to allocate to which nodes. It is important for
8686
cluster health to have a stable master node.
8787

88-
Any master-eligible node (all nodes by default) may be elected to become the
89-
master node by the <<modules-discovery,master election process>>.
88+
Any master-eligible node that is not a <<voting-only-node,voting-only node>> may
89+
be elected to become the master node by the <<modules-discovery,master election
90+
process>>.
9091

9192
IMPORTANT: Master nodes must have access to the `data/` directory (just like
9293
`data` nodes) as this is where the cluster state is persisted between node restarts.
@@ -135,6 +136,47 @@ cluster.remote.connect: false <4>
135136
<3> Disable the `node.ingest` role (enabled by default).
136137
<4> Disable {ccs} (enabled by default).
137138

139+
[float]
140+
[[voting-only-node]]
141+
==== Voting-only master-eligible node
142+
143+
A voting-only master-eligible node is a node that participates in
144+
<<modules-discovery,master elections>> but which will not act as the cluster's
145+
elected master node. In particular, a voting-only node can serve as a tiebreaker
146+
in elections.
147+
148+
It may seem confusing to use the term "master-eligible" to describe a
149+
voting-only node since such a node is not actually eligible to become the master
150+
at all. This terminology is an unfortunate consequence of history:
151+
master-eligible nodes are those nodes that participate in elections and perform
152+
certain tasks during cluster state publications, and voting-only nodes have the
153+
same responsibilities even if they can never become the elected master.
154+
155+
To configure a master-eligible node as a voting-only node, set the following
156+
setting:
157+
158+
[source,yaml]
159+
-------------------
160+
node.voting_only: true <1>
161+
-------------------
162+
<1> The default for `node.voting_only` is `false`.
163+
164+
IMPORTANT: The `voting_only` role requires the {default-dist} of Elasticsearch
165+
and is not supported in the {oss-dist}. If you use the {oss-dist} and set
166+
`node.voting_only` then the node will fail to start. Also note that only
167+
master-eligible nodes can be marked as voting-only.
168+
169+
High availability (HA) clusters require at least three master-eligible nodes, at
170+
least two of which are not voting-only nodes. Such a cluster will be able to
171+
elect a master node even if one of the nodes fails.
172+
173+
Since voting-only nodes never act as the cluster's elected master, they may
174+
require require less heap and a less powerful CPU than the true master nodes.
175+
However all master-eligible nodes, including voting-only nodes, require
176+
reasonably fast persistent storage and a reliable and low-latency network
177+
connection to the rest of the cluster, since they are on the critical path for
178+
<<cluster-state-publishing,publishing cluster state updates>>.
179+
138180
[float]
139181
[[data-node]]
140182
=== Data Node

docs/reference/rest-api/info.asciidoc

+4
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,10 @@ Example response:
111111
"available" : true,
112112
"enabled" : true
113113
},
114+
"voting_only" : {
115+
"available" : true,
116+
"enabled" : true
117+
},
114118
"watcher" : {
115119
"available" : true,
116120
"enabled" : true

server/src/main/java/org/elasticsearch/cluster/coordination/ClusterFormationFailureHelper.java

+6-2
Original file line numberDiff line numberDiff line change
@@ -122,14 +122,16 @@ static class ClusterFormationState {
122122
private final List<TransportAddress> resolvedAddresses;
123123
private final List<DiscoveryNode> foundPeers;
124124
private final long currentTerm;
125+
private final ElectionStrategy electionStrategy;
125126

126127
ClusterFormationState(Settings settings, ClusterState clusterState, List<TransportAddress> resolvedAddresses,
127-
List<DiscoveryNode> foundPeers, long currentTerm) {
128+
List<DiscoveryNode> foundPeers, long currentTerm, ElectionStrategy electionStrategy) {
128129
this.settings = settings;
129130
this.clusterState = clusterState;
130131
this.resolvedAddresses = resolvedAddresses;
131132
this.foundPeers = foundPeers;
132133
this.currentTerm = currentTerm;
134+
this.electionStrategy = electionStrategy;
133135
}
134136

135137
String getDescription() {
@@ -188,7 +190,9 @@ String getDescription() {
188190
final VoteCollection voteCollection = new VoteCollection();
189191
foundPeers.forEach(voteCollection::addVote);
190192
final String isQuorumOrNot
191-
= CoordinationState.isElectionQuorum(voteCollection, clusterState) ? "is a quorum" : "is not a quorum";
193+
= electionStrategy.isElectionQuorum(clusterState.nodes().getLocalNode(), currentTerm, clusterState.term(),
194+
clusterState.version(), clusterState.getLastCommittedConfiguration(), clusterState.getLastAcceptedConfiguration(),
195+
voteCollection) ? "is a quorum" : "is not a quorum";
192196

193197
return String.format(Locale.ROOT,
194198
"master not discovered or elected yet, an election requires %s, have discovered %s which %s; %s",

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationState.java

+37-15
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,14 @@
2424
import org.elasticsearch.cluster.coordination.CoordinationMetaData.VotingConfiguration;
2525
import org.elasticsearch.cluster.metadata.MetaData;
2626
import org.elasticsearch.cluster.node.DiscoveryNode;
27-
import org.elasticsearch.common.settings.Settings;
2827

2928
import java.util.Collection;
3029
import java.util.Collections;
3130
import java.util.HashMap;
31+
import java.util.HashSet;
3232
import java.util.Map;
3333
import java.util.Optional;
34+
import java.util.Set;
3435

3536
import static org.elasticsearch.cluster.coordination.Coordinator.ZEN1_BWC_TERM;
3637

@@ -44,6 +45,8 @@ public class CoordinationState {
4445

4546
private final DiscoveryNode localNode;
4647

48+
private final ElectionStrategy electionStrategy;
49+
4750
// persisted state
4851
private final PersistedState persistedState;
4952

@@ -55,11 +58,12 @@ public class CoordinationState {
5558
private VotingConfiguration lastPublishedConfiguration;
5659
private VoteCollection publishVotes;
5760

58-
public CoordinationState(Settings settings, DiscoveryNode localNode, PersistedState persistedState) {
61+
public CoordinationState(DiscoveryNode localNode, PersistedState persistedState, ElectionStrategy electionStrategy) {
5962
this.localNode = localNode;
6063

6164
// persisted state
6265
this.persistedState = persistedState;
66+
this.electionStrategy = electionStrategy;
6367

6468
// transient state
6569
this.joinVotes = new VoteCollection();
@@ -106,13 +110,9 @@ public boolean electionWon() {
106110
return electionWon;
107111
}
108112

109-
public boolean isElectionQuorum(VoteCollection votes) {
110-
return isElectionQuorum(votes, getLastAcceptedState());
111-
}
112-
113-
static boolean isElectionQuorum(VoteCollection votes, ClusterState lastAcceptedState) {
114-
return votes.isQuorum(lastAcceptedState.getLastCommittedConfiguration())
115-
&& votes.isQuorum(lastAcceptedState.getLastAcceptedConfiguration());
113+
public boolean isElectionQuorum(VoteCollection joinVotes) {
114+
return electionStrategy.isElectionQuorum(localNode, getCurrentTerm(), getLastAcceptedTerm(), getLastAcceptedVersion(),
115+
getLastCommittedConfiguration(), getLastAcceptedConfiguration(), joinVotes);
116116
}
117117

118118
public boolean isPublishQuorum(VoteCollection votes) {
@@ -123,6 +123,11 @@ public boolean containsJoinVoteFor(DiscoveryNode node) {
123123
return joinVotes.containsVoteFor(node);
124124
}
125125

126+
// used for tests
127+
boolean containsJoin(Join join) {
128+
return joinVotes.getJoins().contains(join);
129+
}
130+
126131
public boolean joinVotesHaveQuorumFor(VotingConfiguration votingConfiguration) {
127132
return joinVotes.isQuorum(votingConfiguration);
128133
}
@@ -249,7 +254,7 @@ public boolean handleJoin(Join join) {
249254
throw new CoordinationStateRejectedException("rejecting join since this node has not received its initial configuration yet");
250255
}
251256

252-
boolean added = joinVotes.addVote(join.getSourceNode());
257+
boolean added = joinVotes.addJoinVote(join);
253258
boolean prevElectionWon = electionWon;
254259
electionWon = isElectionQuorum(joinVotes);
255260
assert !prevElectionWon || electionWon; // we cannot go from won to not won
@@ -503,18 +508,28 @@ default void markLastAcceptedStateAsCommitted() {
503508
}
504509

505510
/**
506-
* A collection of votes, used to calculate quorums.
511+
* A collection of votes, used to calculate quorums. Optionally records the Joins as well.
507512
*/
508513
public static class VoteCollection {
509514

510515
private final Map<String, DiscoveryNode> nodes;
516+
private final Set<Join> joins;
511517

512518
public boolean addVote(DiscoveryNode sourceNode) {
513519
return nodes.put(sourceNode.getId(), sourceNode) == null;
514520
}
515521

522+
public boolean addJoinVote(Join join) {
523+
final boolean added = addVote(join.getSourceNode());
524+
if (added) {
525+
joins.add(join);
526+
}
527+
return added;
528+
}
529+
516530
public VoteCollection() {
517531
nodes = new HashMap<>();
532+
joins = new HashSet<>();
518533
}
519534

520535
public boolean isQuorum(VotingConfiguration configuration) {
@@ -533,24 +548,31 @@ public Collection<DiscoveryNode> nodes() {
533548
return Collections.unmodifiableCollection(nodes.values());
534549
}
535550

551+
public Set<Join> getJoins() {
552+
return Collections.unmodifiableSet(joins);
553+
}
554+
536555
@Override
537556
public String toString() {
538-
return "VoteCollection{" + String.join(",", nodes.keySet()) + "}";
557+
return "VoteCollection{votes=" + nodes.keySet() + ", joins=" + joins + "}";
539558
}
540559

541560
@Override
542561
public boolean equals(Object o) {
543562
if (this == o) return true;
544-
if (o == null || getClass() != o.getClass()) return false;
563+
if (!(o instanceof VoteCollection)) return false;
545564

546565
VoteCollection that = (VoteCollection) o;
547566

548-
return nodes.equals(that.nodes);
567+
if (!nodes.equals(that.nodes)) return false;
568+
return joins.equals(that.joins);
549569
}
550570

551571
@Override
552572
public int hashCode() {
553-
return nodes.hashCode();
573+
int result = nodes.hashCode();
574+
result = 31 * result + joins.hashCode();
575+
return result;
554576
}
555577
}
556578
}

0 commit comments

Comments
 (0)