Skip to content

Commit 14194c6

Browse files
lcawlDaveCTurner
andauthored
Suggested changes to quorums.asciidoc
Co-Authored-By: DaveCTurner <[email protected]>
1 parent 442a7a7 commit 14194c6

File tree

1 file changed

+56
-57
lines changed

1 file changed

+56
-57
lines changed

docs/reference/modules/discovery/quorums.asciidoc

Lines changed: 56 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -3,37 +3,36 @@
33

44
Electing a master node and changing the cluster state are the two fundamental
55
tasks that master-eligible nodes must work together to perform. It is important
6-
that these activities work robustly even if some nodes have failed, and
7-
Elasticsearch achieves this robustness by only considering each action to have
8-
succeeded on receipt of responses from a _quorum_, a subset of the
6+
that these activities work robustly even if some nodes have failed.
7+
Elasticsearch achieves this robustness by considering each action to have
8+
succeeded on receipt of responses from a _quorum_, which is a subset of the
99
master-eligible nodes in the cluster. The advantage of requiring only a subset
10-
of the nodes to respond is that it allows for some of the nodes to fail without
11-
preventing the cluster from making progress, and the quorums are carefully
12-
chosen so as not to allow the cluster to "split brain", i.e. to be partitioned
13-
into two pieces each of which may make decisions that are inconsistent with
10+
of the nodes to respond is that it means some of the nodes can fail without
11+
preventing the cluster from making progress. The quorums are carefully
12+
chosen so the cluster does not have a "split brain" scenario where it's partitioned
13+
into two pieces--each of which may make decisions that are inconsistent with
1414
those of the other piece.
1515

1616
Elasticsearch allows you to add and remove master-eligible nodes to a running
1717
cluster. In many cases you can do this simply by starting or stopping the nodes
18-
as required, as described in more detail in the
19-
<<modules-discovery-adding-removing-nodes,section on adding and removing
20-
nodes>>.
18+
as required. See
19+
<<modules-discovery-adding-removing-nodes>>.
2120

2221
As nodes are added or removed Elasticsearch maintains an optimal level of fault
2322
tolerance by updating the cluster's _voting configuration_, which is the set of
2423
master-eligible nodes whose responses are counted when making decisions such as
25-
electing a new master or committing a new cluster state. A decision is only made
26-
once more than half of the nodes in the voting configuration have responded.
24+
electing a new master or committing a new cluster state. A decision is made
25+
only after more than half of the nodes in the voting configuration have responded.
2726
Usually the voting configuration is the same as the set of all the
28-
master-eligible nodes that are currently in the cluster, but there are some
27+
master-eligible nodes that are currently in the cluster. However, there are some
2928
situations in which they may be different.
3029

3130
To be sure that the cluster remains available you **must not stop half or more
3231
of the nodes in the voting configuration at the same time**. As long as more
3332
than half of the voting nodes are available the cluster can still work normally.
34-
This means that if there are three or four master-eligible nodes then the
35-
cluster can tolerate one of them being unavailable; if there are two or fewer
36-
master-eligible nodes then they must all remain available.
33+
This means that if there are three or four master-eligible nodes, the
34+
cluster can tolerate one of them being unavailable. If there are two or fewer
35+
master-eligible nodes, they must all remain available.
3736

3837
After a node has joined or left the cluster the elected master must issue a
3938
cluster-state update that adjusts the voting configuration to match, and this
@@ -43,43 +42,43 @@ to complete before removing more nodes from the cluster.
4342
[float]
4443
==== Setting the initial quorum
4544

46-
When a brand-new cluster starts up for the first time, one of the tasks it must
47-
perform is to elect its first master node, for which it needs to know the set
48-
of master-eligible nodes whose votes should count in this first election. This
45+
When a brand-new cluster starts up for the first time, it must
46+
elect its first master node. To do this election, it needs to know the set
47+
of master-eligible nodes whose votes should count. This
4948
initial voting configuration is known as the _bootstrap configuration_ and is
5049
set in the <<modules-discovery-bootstrap-cluster,cluster bootstrapping
5150
process>>.
5251

5352
It is important that the bootstrap configuration identifies exactly which nodes
54-
should vote in the first election, and it is not sufficient to configure each
53+
should vote in the first election. It is not sufficient to configure each
5554
node with an expectation of how many nodes there should be in the cluster. It
5655
is also important to note that the bootstrap configuration must come from
5756
outside the cluster: there is no safe way for the cluster to determine the
5857
bootstrap configuration correctly on its own.
5958

60-
If the bootstrap configuration is not set correctly then there is a risk when
61-
starting up a brand-new cluster is that you accidentally form two separate
62-
clusters instead of one. This could lead to data loss: you might start using
63-
both clusters before noticing that anything had gone wrong, and it will then be
59+
If the bootstrap configuration is not set correctly, when
60+
you start a brand-new cluster there is a risk that you will accidentally form two separate
61+
clusters instead of one. This situation can lead to data loss: you might start using
62+
both clusters before you notice that anything has gone wrong and it is
6463
impossible to merge them together later.
6564

6665
NOTE: To illustrate the problem with configuring each node to expect a certain
6766
cluster size, imagine starting up a three-node cluster in which each node knows
6867
that it is going to be part of a three-node cluster. A majority of three nodes
69-
is two, so normally the first two nodes to discover each other will form a
70-
cluster and the third node will join them a short time later. However, imagine
71-
that four nodes were erroneously started instead of three: in this case there
68+
is two, so normally the first two nodes to discover each other form a
69+
cluster and the third node joins them a short time later. However, imagine
70+
that four nodes were erroneously started instead of three. In this case, there
7271
are enough nodes to form two separate clusters. Of course if each node is
73-
started manually then it's unlikely that too many nodes are started, but it's
74-
certainly possible to get into this situation if using a more automated
75-
orchestrator, particularly if the orchestrator is not resilient to failures
72+
started manually then it's unlikely that too many nodes are started. If you're using an automated orchestrator, however, it's
73+
certainly possible to get into this situation--
74+
particularly if the orchestrator is not resilient to failures
7675
such as network partitions.
7776

7877
The initial quorum is only required the very first time a whole cluster starts
79-
up: new nodes joining an established cluster can safely obtain all the
80-
information they need from the elected master, and nodes that have previously
81-
been part of a cluster will have stored to disk all the information required
82-
when restarting.
78+
up. New nodes joining an established cluster can safely obtain all the
79+
information they need from the elected master. Nodes that have previously
80+
been part of a cluster will have stored to disk all the information that is required
81+
when they restart.
8382

8483
[float]
8584
==== Cluster maintenance, rolling restarts and migrations
@@ -99,7 +98,7 @@ nodes is not changing permanently.
9998
Nodes may join or leave the cluster, and Elasticsearch reacts by making
10099
corresponding changes to the voting configuration in order to ensure that the
101100
cluster is as resilient as possible. The default auto-reconfiguration behaviour
102-
is expected to give the best results in most situation. The current voting
101+
is expected to give the best results in most situations. The current voting
103102
configuration is stored in the cluster state so you can inspect its current
104103
contents as follows:
105104

@@ -111,24 +110,24 @@ GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_con
111110

112111
NOTE: The current voting configuration is not necessarily the same as the set of
113112
all available master-eligible nodes in the cluster. Altering the voting
114-
configuration itself involves taking a vote, so it takes some time to adjust the
113+
configuration involves taking a vote, so it takes some time to adjust the
115114
configuration as nodes join or leave the cluster. Also, there are situations
116115
where the most resilient configuration includes unavailable nodes, or does not
117116
include some available nodes, and in these situations the voting configuration
118-
will differ from the set of available master-eligible nodes in the cluster.
117+
differs from the set of available master-eligible nodes in the cluster.
119118

120-
Larger voting configurations are usually more resilient, so Elasticsearch will
121-
normally prefer to add master-eligible nodes to the voting configuration once
122-
they have joined the cluster. Similarly, if a node in the voting configuration
119+
Larger voting configurations are usually more resilient, so Elasticsearch
120+
normally prefers to add master-eligible nodes to the voting configuration after
121+
they join the cluster. Similarly, if a node in the voting configuration
123122
leaves the cluster and there is another master-eligible node in the cluster that
124123
is not in the voting configuration then it is preferable to swap these two nodes
125-
over, leaving the size of the voting configuration unchanged but increasing its
126-
resilience.
124+
over. The size of the voting configuration is thus unchanged but its
125+
resilience increases.
127126

128127
It is not so straightforward to automatically remove nodes from the voting
129-
configuration after they have left the cluster, and different strategies have
128+
configuration after they have left the cluster. Different strategies have
130129
different benefits and drawbacks, so the right choice depends on how the cluster
131-
will be used and is controlled by the following setting.
130+
will be used. You can control whether the voting configuration automatically shrinks by using the following setting:
132131

133132
`cluster.auto_shrink_voting_configuration`::
134133

@@ -151,30 +150,30 @@ configuration manually, using the
151150
<<modules-discovery-adding-removing-nodes,voting exclusions API>>, to achieve
152151
the desired level of resilience.
153152

154-
Note that Elasticsearch will not suffer from a "split-brain" inconsistency
155-
however it is configured. This setting only affects its availability in the
153+
No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency.
154+
The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the
156155
event of the failure of some of its nodes, and the administrative tasks that
157156
must be performed as nodes join and leave the cluster.
158157

159158
[float]
160159
==== Even numbers of master-eligible nodes
161160

162161
There should normally be an odd number of master-eligible nodes in a cluster.
163-
If there is an even number then Elasticsearch will leave one of them out of the
164-
voting configuration to ensure that it has an odd size. This does not decrease
165-
the failure-tolerance of the cluster, and in fact improves it slightly: if the
162+
If there is an even number, Elasticsearch leaves one of them out of the
163+
voting configuration to ensure that it has an odd size. This omission does not decrease
164+
the failure-tolerance of the cluster. In fact, improves it slightly: if the
166165
cluster is partitioned into two even halves then one of the halves will contain
167-
a majority of the voting configuration and will be able to keep operating,
168-
whereas if all of the master-eligible nodes' votes were counted then neither
166+
a majority of the voting configuration and will be able to keep operating.
167+
If all of the master-eligible nodes' votes were counted, neither
169168
side could make any progress in this situation.
170169

171170
For instance if there are four master-eligible nodes in the cluster and the
172-
voting configuration contained all of them then any quorum-based decision would
173-
require votes from at least three of them, which means that the cluster can only
174-
tolerate the loss of a single master-eligible node. If this cluster were split
175-
into two equal halves then neither half would contain three master-eligible
176-
nodes so would not be able to make any progress. However if the voting
177-
configuration contains only three of the four master-eligible nodes then the
171+
voting configuration contained all of them, any quorum-based decision would
172+
require votes from at least three of them. This situation means that the cluster can
173+
tolerate the loss of only a single master-eligible node. If this cluster were split
174+
into two equal halves, neither half would contain three master-eligible
175+
nodes and the cluster would not be able to make any progress. If the voting
176+
configuration contains only three of the four master-eligible nodes, however, the
178177
cluster is still only fully tolerant to the loss of one node, but quorum-based
179178
decisions require votes from two of the three voting nodes. In the event of an
180179
even split, one half will contain two of the three voting nodes so will remain

0 commit comments

Comments
 (0)