Skip to content

Commit e689b20

Browse files
ywelschDaveCTurner
andauthored
Add voting-only master node (#43410)
A voting-only master-eligible node is a node that can participate in master elections but will not act as a master in the cluster. In particular, a voting-only node can help elect another master-eligible node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two can still elect a master amongst them-selves. This only requires one of the two remaining nodes to have the capability to act as master, but both need to have voting powers. This means that one of the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a voting-only non-dedicated master node can play the role of the third master-eligible node, which allows running an HA cluster with only two dedicated master nodes. Closes #14340 Co-authored-by: David Turner <[email protected]>
1 parent ba07eb4 commit e689b20

File tree

42 files changed

+2560
-1575
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+2560
-1575
lines changed

A

Whitespace-only changes.

docs/reference/cluster.asciidoc

+12-6
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,14 @@ one of the following:
2222
* an IP address or hostname, to add all matching nodes to the subset.
2323
* a pattern, using `*` wildcards, which adds all nodes to the subset
2424
whose name, address or hostname matches the pattern.
25-
* `master:true`, `data:true`, `ingest:true` or `coordinating_only:true`, which
26-
respectively add to the subset all master-eligible nodes, all data nodes,
27-
all ingest nodes, and all coordinating-only nodes.
28-
* `master:false`, `data:false`, `ingest:false` or `coordinating_only:false`,
29-
which respectively remove from the subset all master-eligible nodes, all data
30-
nodes, all ingest nodes, and all coordinating-only nodes.
25+
* `master:true`, `data:true`, `ingest:true`, `voting_only:true` or
26+
`coordinating_only:true`, which respectively add to the subset all
27+
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
28+
nodes, and all coordinating-only nodes.
29+
* `master:false`, `data:false`, `ingest:false`, `voting_only:true`, or
30+
`coordinating_only:false`, which respectively remove from the subset all
31+
master-eligible nodes, all data nodes, all ingest nodes, all voting-only
32+
nodes and all coordinating-only nodes.
3133
* a pair of patterns, using `*` wildcards, of the form `attrname:attrvalue`,
3234
which adds to the subset all nodes with a custom node attribute whose name
3335
and value match the respective patterns. Custom node attributes are
@@ -46,6 +48,9 @@ means that filters such as `master:false` which remove nodes from the chosen
4648
subset are only useful if they come after some other filters. When used on its
4749
own, `master:false` selects no nodes.
4850

51+
NOTE: The `voting_only` role requires the {default-dist} of Elasticsearch and
52+
is not supported in the {oss-dist}.
53+
4954
Here are some examples of the use of node filters with the
5055
<<cluster-nodes-info,Nodes Info>> APIs.
5156

@@ -69,6 +74,7 @@ GET /_nodes/10.0.0.*
6974
GET /_nodes/_all,master:false
7075
GET /_nodes/data:true,ingest:true
7176
GET /_nodes/coordinating_only:true
77+
GET /_nodes/master:true,voting_only:false
7278
# Select nodes by custom attribute (e.g. with something like `node.attr.rack: 2` in the configuration file)
7379
GET /_nodes/rack:2
7480
GET /_nodes/ra*:2

docs/reference/cluster/stats.asciidoc

+7-2
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,8 @@ Will return, for example:
109109
"data": 1,
110110
"coordinating_only": 0,
111111
"master": 1,
112-
"ingest": 1
112+
"ingest": 1,
113+
"voting_only": 0
113114
},
114115
"versions": [
115116
"{version}"
@@ -207,6 +208,7 @@ Will return, for example:
207208
// TESTRESPONSE[s/"plugins": \[[^\]]*\]/"plugins": $body.$_path/]
208209
// TESTRESPONSE[s/"network_types": \{[^\}]*\}/"network_types": $body.$_path/]
209210
// TESTRESPONSE[s/"discovery_types": \{[^\}]*\}/"discovery_types": $body.$_path/]
211+
// TESTRESPONSE[s/"count": \{[^\}]*\}/"count": $body.$_path/]
210212
// TESTRESPONSE[s/"packaging_types": \[[^\]]*\]/"packaging_types": $body.$_path/]
211213
// TESTRESPONSE[s/: true|false/: $body.$_path/]
212214
// TESTRESPONSE[s/: (\-)?[0-9]+/: $body.$_path/]
@@ -217,7 +219,10 @@ Will return, for example:
217219
// see an exhaustive list anyway.
218220
// 2. Similarly, ignore the contents of `network_types`, `discovery_types`, and
219221
// `packaging_types`.
220-
// 3. All of the numbers and strings on the right hand side of *every* field in
222+
// 3. Ignore the contents of the (nodes) count object, as what's shown here
223+
// depends on the license. Voting-only nodes are e.g. only shown when this
224+
// test runs with a basic license.
225+
// 4. All of the numbers and strings on the right hand side of *every* field in
221226
// the response are ignored. So we're really only asserting things about the
222227
// the shape of this response, not the values in it.
223228

docs/reference/modules/node.asciidoc

+44-2
Original file line numberDiff line numberDiff line change
@@ -84,8 +84,9 @@ creating or deleting an index, tracking which nodes are part of the cluster,
8484
and deciding which shards to allocate to which nodes. It is important for
8585
cluster health to have a stable master node.
8686

87-
Any master-eligible node (all nodes by default) may be elected to become the
88-
master node by the <<modules-discovery,master election process>>.
87+
Any master-eligible node that is not a <<voting-only-node,voting-only node>> may
88+
be elected to become the master node by the <<modules-discovery,master election
89+
process>>.
8990

9091
IMPORTANT: Master nodes must have access to the `data/` directory (just like
9192
`data` nodes) as this is where the cluster state is persisted between node restarts.
@@ -134,6 +135,47 @@ cluster.remote.connect: false <4>
134135
<3> Disable the `node.ingest` role (enabled by default).
135136
<4> Disable {ccs} (enabled by default).
136137

138+
[float]
139+
[[voting-only-node]]
140+
==== Voting-only master-eligible node
141+
142+
A voting-only master-eligible node is a node that participates in
143+
<<modules-discovery,master elections>> but which will not act as the cluster's
144+
elected master node. In particular, a voting-only node can serve as a tiebreaker
145+
in elections.
146+
147+
It may seem confusing to use the term "master-eligible" to describe a
148+
voting-only node since such a node is not actually eligible to become the master
149+
at all. This terminology is an unfortunate consequence of history:
150+
master-eligible nodes are those nodes that participate in elections and perform
151+
certain tasks during cluster state publications, and voting-only nodes have the
152+
same responsibilities even if they can never become the elected master.
153+
154+
To configure a master-eligible node as a voting-only node, set the following
155+
setting:
156+
157+
[source,yaml]
158+
-------------------
159+
node.voting_only: true <1>
160+
-------------------
161+
<1> The default for `node.voting_only` is `false`.
162+
163+
IMPORTANT: The `voting_only` role requires the {default-dist} of Elasticsearch
164+
and is not supported in the {oss-dist}. If you use the {oss-dist} and set
165+
`node.voting_only` then the node will fail to start. Also note that only
166+
master-eligible nodes can be marked as voting-only.
167+
168+
High availability (HA) clusters require at least three master-eligible nodes, at
169+
least two of which are not voting-only nodes. Such a cluster will be able to
170+
elect a master node even if one of the nodes fails.
171+
172+
Since voting-only nodes never act as the cluster's elected master, they may
173+
require require less heap and a less powerful CPU than the true master nodes.
174+
However all master-eligible nodes, including voting-only nodes, require
175+
reasonably fast persistent storage and a reliable and low-latency network
176+
connection to the rest of the cluster, since they are on the critical path for
177+
<<cluster-state-publishing,publishing cluster state updates>>.
178+
137179
[float]
138180
[[data-node]]
139181
=== Data Node

docs/reference/rest-api/info.asciidoc

+4
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,10 @@ Example response:
107107
"available" : true,
108108
"enabled" : true
109109
},
110+
"voting_only" : {
111+
"available" : true,
112+
"enabled" : true
113+
},
110114
"watcher" : {
111115
"available" : true,
112116
"enabled" : true

server/src/main/java/org/elasticsearch/cluster/coordination/ClusterFormationFailureHelper.java

+6-2
Original file line numberDiff line numberDiff line change
@@ -121,14 +121,16 @@ static class ClusterFormationState {
121121
private final List<TransportAddress> resolvedAddresses;
122122
private final List<DiscoveryNode> foundPeers;
123123
private final long currentTerm;
124+
private final ElectionStrategy electionStrategy;
124125

125126
ClusterFormationState(Settings settings, ClusterState clusterState, List<TransportAddress> resolvedAddresses,
126-
List<DiscoveryNode> foundPeers, long currentTerm) {
127+
List<DiscoveryNode> foundPeers, long currentTerm, ElectionStrategy electionStrategy) {
127128
this.settings = settings;
128129
this.clusterState = clusterState;
129130
this.resolvedAddresses = resolvedAddresses;
130131
this.foundPeers = foundPeers;
131132
this.currentTerm = currentTerm;
133+
this.electionStrategy = electionStrategy;
132134
}
133135

134136
String getDescription() {
@@ -185,7 +187,9 @@ String getDescription() {
185187
final VoteCollection voteCollection = new VoteCollection();
186188
foundPeers.forEach(voteCollection::addVote);
187189
final String isQuorumOrNot
188-
= CoordinationState.isElectionQuorum(voteCollection, clusterState) ? "is a quorum" : "is not a quorum";
190+
= electionStrategy.isElectionQuorum(clusterState.nodes().getLocalNode(), currentTerm, clusterState.term(),
191+
clusterState.version(), clusterState.getLastCommittedConfiguration(), clusterState.getLastAcceptedConfiguration(),
192+
voteCollection) ? "is a quorum" : "is not a quorum";
189193

190194
return String.format(Locale.ROOT,
191195
"master not discovered or elected yet, an election requires %s, have discovered %s which %s; %s",

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationState.java

+37-15
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,14 @@
2424
import org.elasticsearch.cluster.coordination.CoordinationMetaData.VotingConfiguration;
2525
import org.elasticsearch.cluster.metadata.MetaData;
2626
import org.elasticsearch.cluster.node.DiscoveryNode;
27-
import org.elasticsearch.common.settings.Settings;
2827

2928
import java.util.Collection;
3029
import java.util.Collections;
3130
import java.util.HashMap;
31+
import java.util.HashSet;
3232
import java.util.Map;
3333
import java.util.Optional;
34+
import java.util.Set;
3435

3536
/**
3637
* The core class of the cluster state coordination algorithm, directly implementing the
@@ -42,6 +43,8 @@ public class CoordinationState {
4243

4344
private final DiscoveryNode localNode;
4445

46+
private final ElectionStrategy electionStrategy;
47+
4548
// persisted state
4649
private final PersistedState persistedState;
4750

@@ -53,11 +56,12 @@ public class CoordinationState {
5356
private VotingConfiguration lastPublishedConfiguration;
5457
private VoteCollection publishVotes;
5558

56-
public CoordinationState(Settings settings, DiscoveryNode localNode, PersistedState persistedState) {
59+
public CoordinationState(DiscoveryNode localNode, PersistedState persistedState, ElectionStrategy electionStrategy) {
5760
this.localNode = localNode;
5861

5962
// persisted state
6063
this.persistedState = persistedState;
64+
this.electionStrategy = electionStrategy;
6165

6266
// transient state
6367
this.joinVotes = new VoteCollection();
@@ -100,13 +104,9 @@ public boolean electionWon() {
100104
return electionWon;
101105
}
102106

103-
public boolean isElectionQuorum(VoteCollection votes) {
104-
return isElectionQuorum(votes, getLastAcceptedState());
105-
}
106-
107-
static boolean isElectionQuorum(VoteCollection votes, ClusterState lastAcceptedState) {
108-
return votes.isQuorum(lastAcceptedState.getLastCommittedConfiguration())
109-
&& votes.isQuorum(lastAcceptedState.getLastAcceptedConfiguration());
107+
public boolean isElectionQuorum(VoteCollection joinVotes) {
108+
return electionStrategy.isElectionQuorum(localNode, getCurrentTerm(), getLastAcceptedTerm(), getLastAcceptedVersion(),
109+
getLastCommittedConfiguration(), getLastAcceptedConfiguration(), joinVotes);
110110
}
111111

112112
public boolean isPublishQuorum(VoteCollection votes) {
@@ -117,6 +117,11 @@ public boolean containsJoinVoteFor(DiscoveryNode node) {
117117
return joinVotes.containsVoteFor(node);
118118
}
119119

120+
// used for tests
121+
boolean containsJoin(Join join) {
122+
return joinVotes.getJoins().contains(join);
123+
}
124+
120125
public boolean joinVotesHaveQuorumFor(VotingConfiguration votingConfiguration) {
121126
return joinVotes.isQuorum(votingConfiguration);
122127
}
@@ -243,7 +248,7 @@ public boolean handleJoin(Join join) {
243248
throw new CoordinationStateRejectedException("rejecting join since this node has not received its initial configuration yet");
244249
}
245250

246-
boolean added = joinVotes.addVote(join.getSourceNode());
251+
boolean added = joinVotes.addJoinVote(join);
247252
boolean prevElectionWon = electionWon;
248253
electionWon = isElectionQuorum(joinVotes);
249254
assert !prevElectionWon || electionWon; // we cannot go from won to not won
@@ -489,18 +494,28 @@ default void markLastAcceptedStateAsCommitted() {
489494
}
490495

491496
/**
492-
* A collection of votes, used to calculate quorums.
497+
* A collection of votes, used to calculate quorums. Optionally records the Joins as well.
493498
*/
494499
public static class VoteCollection {
495500

496501
private final Map<String, DiscoveryNode> nodes;
502+
private final Set<Join> joins;
497503

498504
public boolean addVote(DiscoveryNode sourceNode) {
499505
return nodes.put(sourceNode.getId(), sourceNode) == null;
500506
}
501507

508+
public boolean addJoinVote(Join join) {
509+
final boolean added = addVote(join.getSourceNode());
510+
if (added) {
511+
joins.add(join);
512+
}
513+
return added;
514+
}
515+
502516
public VoteCollection() {
503517
nodes = new HashMap<>();
518+
joins = new HashSet<>();
504519
}
505520

506521
public boolean isQuorum(VotingConfiguration configuration) {
@@ -519,24 +534,31 @@ public Collection<DiscoveryNode> nodes() {
519534
return Collections.unmodifiableCollection(nodes.values());
520535
}
521536

537+
public Set<Join> getJoins() {
538+
return Collections.unmodifiableSet(joins);
539+
}
540+
522541
@Override
523542
public String toString() {
524-
return "VoteCollection{" + String.join(",", nodes.keySet()) + "}";
543+
return "VoteCollection{votes=" + nodes.keySet() + ", joins=" + joins + "}";
525544
}
526545

527546
@Override
528547
public boolean equals(Object o) {
529548
if (this == o) return true;
530-
if (o == null || getClass() != o.getClass()) return false;
549+
if (!(o instanceof VoteCollection)) return false;
531550

532551
VoteCollection that = (VoteCollection) o;
533552

534-
return nodes.equals(that.nodes);
553+
if (!nodes.equals(that.nodes)) return false;
554+
return joins.equals(that.joins);
535555
}
536556

537557
@Override
538558
public int hashCode() {
539-
return nodes.hashCode();
559+
int result = nodes.hashCode();
560+
result = 31 * result + joins.hashCode();
561+
return result;
540562
}
541563
}
542564
}

0 commit comments

Comments
 (0)