3
3
4
4
Electing a master node and changing the cluster state are the two fundamental
5
5
tasks that master-eligible nodes must work together to perform. It is important
6
- that these activities work robustly even if some nodes have failed, and
7
- Elasticsearch achieves this robustness by only considering each action to have
8
- succeeded on receipt of responses from a _quorum_, a subset of the
6
+ that these activities work robustly even if some nodes have failed.
7
+ Elasticsearch achieves this robustness by considering each action to have
8
+ succeeded on receipt of responses from a _quorum_, which is a subset of the
9
9
master-eligible nodes in the cluster. The advantage of requiring only a subset
10
- of the nodes to respond is that it allows for some of the nodes to fail without
11
- preventing the cluster from making progress, and the quorums are carefully
12
- chosen so as not to allow the cluster to "split brain", i.e. to be partitioned
13
- into two pieces each of which may make decisions that are inconsistent with
10
+ of the nodes to respond is that it means some of the nodes can fail without
11
+ preventing the cluster from making progress. The quorums are carefully
12
+ chosen so the cluster does not have a "split brain" scenario where it's partitioned
13
+ into two pieces-- each of which may make decisions that are inconsistent with
14
14
those of the other piece.
15
15
16
16
Elasticsearch allows you to add and remove master-eligible nodes to a running
17
17
cluster. In many cases you can do this simply by starting or stopping the nodes
18
- as required, as described in more detail in the
19
- <<modules-discovery-adding-removing-nodes,section on adding and removing
20
- nodes>>.
18
+ as required. See
19
+ <<modules-discovery-adding-removing-nodes>>.
21
20
22
21
As nodes are added or removed Elasticsearch maintains an optimal level of fault
23
22
tolerance by updating the cluster's _voting configuration_, which is the set of
24
23
master-eligible nodes whose responses are counted when making decisions such as
25
- electing a new master or committing a new cluster state. A decision is only made
26
- once more than half of the nodes in the voting configuration have responded.
24
+ electing a new master or committing a new cluster state. A decision is made
25
+ only after more than half of the nodes in the voting configuration have responded.
27
26
Usually the voting configuration is the same as the set of all the
28
- master-eligible nodes that are currently in the cluster, but there are some
27
+ master-eligible nodes that are currently in the cluster. However, there are some
29
28
situations in which they may be different.
30
29
31
30
To be sure that the cluster remains available you **must not stop half or more
32
31
of the nodes in the voting configuration at the same time**. As long as more
33
32
than half of the voting nodes are available the cluster can still work normally.
34
- This means that if there are three or four master-eligible nodes then the
35
- cluster can tolerate one of them being unavailable; if there are two or fewer
36
- master-eligible nodes then they must all remain available.
33
+ This means that if there are three or four master-eligible nodes, the
34
+ cluster can tolerate one of them being unavailable. If there are two or fewer
35
+ master-eligible nodes, they must all remain available.
37
36
38
37
After a node has joined or left the cluster the elected master must issue a
39
38
cluster-state update that adjusts the voting configuration to match, and this
@@ -43,43 +42,43 @@ to complete before removing more nodes from the cluster.
43
42
[float]
44
43
==== Setting the initial quorum
45
44
46
- When a brand-new cluster starts up for the first time, one of the tasks it must
47
- perform is to elect its first master node, for which it needs to know the set
48
- of master-eligible nodes whose votes should count in this first election . This
45
+ When a brand-new cluster starts up for the first time, it must
46
+ elect its first master node. To do this election, it needs to know the set
47
+ of master-eligible nodes whose votes should count. This
49
48
initial voting configuration is known as the _bootstrap configuration_ and is
50
49
set in the <<modules-discovery-bootstrap-cluster,cluster bootstrapping
51
50
process>>.
52
51
53
52
It is important that the bootstrap configuration identifies exactly which nodes
54
- should vote in the first election, and it is not sufficient to configure each
53
+ should vote in the first election. It is not sufficient to configure each
55
54
node with an expectation of how many nodes there should be in the cluster. It
56
55
is also important to note that the bootstrap configuration must come from
57
56
outside the cluster: there is no safe way for the cluster to determine the
58
57
bootstrap configuration correctly on its own.
59
58
60
- If the bootstrap configuration is not set correctly then there is a risk when
61
- starting up a brand-new cluster is that you accidentally form two separate
62
- clusters instead of one. This could lead to data loss: you might start using
63
- both clusters before noticing that anything had gone wrong, and it will then be
59
+ If the bootstrap configuration is not set correctly, when
60
+ you start a brand-new cluster there is a risk that you will accidentally form two separate
61
+ clusters instead of one. This situation can lead to data loss: you might start using
62
+ both clusters before you notice that anything has gone wrong and it is
64
63
impossible to merge them together later.
65
64
66
65
NOTE: To illustrate the problem with configuring each node to expect a certain
67
66
cluster size, imagine starting up a three-node cluster in which each node knows
68
67
that it is going to be part of a three-node cluster. A majority of three nodes
69
- is two, so normally the first two nodes to discover each other will form a
70
- cluster and the third node will join them a short time later. However, imagine
71
- that four nodes were erroneously started instead of three: in this case there
68
+ is two, so normally the first two nodes to discover each other form a
69
+ cluster and the third node joins them a short time later. However, imagine
70
+ that four nodes were erroneously started instead of three. In this case, there
72
71
are enough nodes to form two separate clusters. Of course if each node is
73
- started manually then it's unlikely that too many nodes are started, but it's
74
- certainly possible to get into this situation if using a more automated
75
- orchestrator, particularly if the orchestrator is not resilient to failures
72
+ started manually then it's unlikely that too many nodes are started. If you're using an automated orchestrator, however, it's
73
+ certainly possible to get into this situation--
74
+ particularly if the orchestrator is not resilient to failures
76
75
such as network partitions.
77
76
78
77
The initial quorum is only required the very first time a whole cluster starts
79
- up: new nodes joining an established cluster can safely obtain all the
80
- information they need from the elected master, and nodes that have previously
81
- been part of a cluster will have stored to disk all the information required
82
- when restarting .
78
+ up. New nodes joining an established cluster can safely obtain all the
79
+ information they need from the elected master. Nodes that have previously
80
+ been part of a cluster will have stored to disk all the information that is required
81
+ when they restart .
83
82
84
83
[float]
85
84
==== Cluster maintenance, rolling restarts and migrations
@@ -99,7 +98,7 @@ nodes is not changing permanently.
99
98
Nodes may join or leave the cluster, and Elasticsearch reacts by making
100
99
corresponding changes to the voting configuration in order to ensure that the
101
100
cluster is as resilient as possible. The default auto-reconfiguration behaviour
102
- is expected to give the best results in most situation . The current voting
101
+ is expected to give the best results in most situations . The current voting
103
102
configuration is stored in the cluster state so you can inspect its current
104
103
contents as follows:
105
104
@@ -111,24 +110,24 @@ GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_con
111
110
112
111
NOTE: The current voting configuration is not necessarily the same as the set of
113
112
all available master-eligible nodes in the cluster. Altering the voting
114
- configuration itself involves taking a vote, so it takes some time to adjust the
113
+ configuration involves taking a vote, so it takes some time to adjust the
115
114
configuration as nodes join or leave the cluster. Also, there are situations
116
115
where the most resilient configuration includes unavailable nodes, or does not
117
116
include some available nodes, and in these situations the voting configuration
118
- will differ from the set of available master-eligible nodes in the cluster.
117
+ differs from the set of available master-eligible nodes in the cluster.
119
118
120
- Larger voting configurations are usually more resilient, so Elasticsearch will
121
- normally prefer to add master-eligible nodes to the voting configuration once
122
- they have joined the cluster. Similarly, if a node in the voting configuration
119
+ Larger voting configurations are usually more resilient, so Elasticsearch
120
+ normally prefers to add master-eligible nodes to the voting configuration after
121
+ they join the cluster. Similarly, if a node in the voting configuration
123
122
leaves the cluster and there is another master-eligible node in the cluster that
124
123
is not in the voting configuration then it is preferable to swap these two nodes
125
- over, leaving the size of the voting configuration unchanged but increasing its
126
- resilience.
124
+ over. The size of the voting configuration is thus unchanged but its
125
+ resilience increases .
127
126
128
127
It is not so straightforward to automatically remove nodes from the voting
129
- configuration after they have left the cluster, and different strategies have
128
+ configuration after they have left the cluster. Different strategies have
130
129
different benefits and drawbacks, so the right choice depends on how the cluster
131
- will be used and is controlled by the following setting.
130
+ will be used. You can control whether the voting configuration automatically shrinks by using the following setting:
132
131
133
132
`cluster.auto_shrink_voting_configuration`::
134
133
@@ -151,30 +150,30 @@ configuration manually, using the
151
150
<<modules-discovery-adding-removing-nodes,voting exclusions API>>, to achieve
152
151
the desired level of resilience.
153
152
154
- Note that Elasticsearch will not suffer from a "split-brain" inconsistency
155
- however it is configured. This setting only affects its availability in the
153
+ No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency.
154
+ The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the
156
155
event of the failure of some of its nodes, and the administrative tasks that
157
156
must be performed as nodes join and leave the cluster.
158
157
159
158
[float]
160
159
==== Even numbers of master-eligible nodes
161
160
162
161
There should normally be an odd number of master-eligible nodes in a cluster.
163
- If there is an even number then Elasticsearch will leave one of them out of the
164
- voting configuration to ensure that it has an odd size. This does not decrease
165
- the failure-tolerance of the cluster, and in fact improves it slightly: if the
162
+ If there is an even number, Elasticsearch leaves one of them out of the
163
+ voting configuration to ensure that it has an odd size. This omission does not decrease
164
+ the failure-tolerance of the cluster. In fact, improves it slightly: if the
166
165
cluster is partitioned into two even halves then one of the halves will contain
167
- a majority of the voting configuration and will be able to keep operating,
168
- whereas if all of the master-eligible nodes' votes were counted then neither
166
+ a majority of the voting configuration and will be able to keep operating.
167
+ If all of the master-eligible nodes' votes were counted, neither
169
168
side could make any progress in this situation.
170
169
171
170
For instance if there are four master-eligible nodes in the cluster and the
172
- voting configuration contained all of them then any quorum-based decision would
173
- require votes from at least three of them, which means that the cluster can only
174
- tolerate the loss of a single master-eligible node. If this cluster were split
175
- into two equal halves then neither half would contain three master-eligible
176
- nodes so would not be able to make any progress. However if the voting
177
- configuration contains only three of the four master-eligible nodes then the
171
+ voting configuration contained all of them, any quorum-based decision would
172
+ require votes from at least three of them. This situation means that the cluster can
173
+ tolerate the loss of only a single master-eligible node. If this cluster were split
174
+ into two equal halves, neither half would contain three master-eligible
175
+ nodes and the cluster would not be able to make any progress. If the voting
176
+ configuration contains only three of the four master-eligible nodes, however, the
178
177
cluster is still only fully tolerant to the loss of one node, but quorum-based
179
178
decisions require votes from two of the three voting nodes. In the event of an
180
179
even split, one half will contain two of the three voting nodes so will remain
0 commit comments