@@ -18,13 +18,13 @@ cluster. In many cases you can do this simply by starting or stopping the nodes
18
18
as required. See <<modules-discovery-adding-removing-nodes>>.
19
19
20
20
As nodes are added or removed Elasticsearch maintains an optimal level of fault
21
- tolerance by updating the cluster's _voting configuration_, which is the set of
22
- master-eligible nodes whose responses are counted when making decisions such as
23
- electing a new master or committing a new cluster state. A decision is made only
24
- after more than half of the nodes in the voting configuration have responded.
25
- Usually the voting configuration is the same as the set of all the
26
- master-eligible nodes that are currently in the cluster. However, there are some
27
- situations in which they may be different.
21
+ tolerance by updating the cluster's <<modules-discovery-voting,voting
22
+ configuration>>, which is the set of master-eligible nodes whose responses are
23
+ counted when making decisions such as electing a new master or committing a new
24
+ cluster state. A decision is made only after more than half of the nodes in the
25
+ voting configuration have responded. Usually the voting configuration is the
26
+ same as the set of all the master-eligible nodes that are currently in the
27
+ cluster. However, there are some situations in which they may be different.
28
28
29
29
To be sure that the cluster remains available you **must not stop half or more
30
30
of the nodes in the voting configuration at the same time**. As long as more
@@ -38,46 +38,6 @@ cluster-state update that adjusts the voting configuration to match, and this
38
38
can take a short time to complete. It is important to wait for this adjustment
39
39
to complete before removing more nodes from the cluster.
40
40
41
- [float]
42
- ==== Setting the initial quorum
43
-
44
- When a brand-new cluster starts up for the first time, it must elect its first
45
- master node. To do this election, it needs to know the set of master-eligible
46
- nodes whose votes should count. This initial voting configuration is known as
47
- the _bootstrap configuration_ and is set in the
48
- <<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
49
-
50
- It is important that the bootstrap configuration identifies exactly which nodes
51
- should vote in the first election. It is not sufficient to configure each node
52
- with an expectation of how many nodes there should be in the cluster. It is also
53
- important to note that the bootstrap configuration must come from outside the
54
- cluster: there is no safe way for the cluster to determine the bootstrap
55
- configuration correctly on its own.
56
-
57
- If the bootstrap configuration is not set correctly, when you start a brand-new
58
- cluster there is a risk that you will accidentally form two separate clusters
59
- instead of one. This situation can lead to data loss: you might start using both
60
- clusters before you notice that anything has gone wrong and it is impossible to
61
- merge them together later.
62
-
63
- NOTE: To illustrate the problem with configuring each node to expect a certain
64
- cluster size, imagine starting up a three-node cluster in which each node knows
65
- that it is going to be part of a three-node cluster. A majority of three nodes
66
- is two, so normally the first two nodes to discover each other form a cluster
67
- and the third node joins them a short time later. However, imagine that four
68
- nodes were erroneously started instead of three. In this case, there are enough
69
- nodes to form two separate clusters. Of course if each node is started manually
70
- then it's unlikely that too many nodes are started. If you're using an automated
71
- orchestrator, however, it's certainly possible to get into this situation--
72
- particularly if the orchestrator is not resilient to failures such as network
73
- partitions.
74
-
75
- The initial quorum is only required the very first time a whole cluster starts
76
- up. New nodes joining an established cluster can safely obtain all the
77
- information they need from the elected master. Nodes that have previously been
78
- part of a cluster will have stored to disk all the information that is required
79
- when they restart.
80
-
81
41
[float]
82
42
==== Master elections
83
43
@@ -104,92 +64,3 @@ and then started again then it will automatically recover, such as during a
104
64
action with the APIs described here in these cases, because the set of master
105
65
nodes is not changing permanently.
106
66
107
- [float]
108
- ==== Automatic changes to the voting configuration
109
-
110
- Nodes may join or leave the cluster, and Elasticsearch reacts by automatically
111
- making corresponding changes to the voting configuration in order to ensure that
112
- the cluster is as resilient as possible.
113
-
114
- The default auto-reconfiguration
115
- behaviour is expected to give the best results in most situations. The current
116
- voting configuration is stored in the cluster state so you can inspect its
117
- current contents as follows:
118
-
119
- [source,js]
120
- --------------------------------------------------
121
- GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config
122
- --------------------------------------------------
123
- // CONSOLE
124
-
125
- NOTE: The current voting configuration is not necessarily the same as the set of
126
- all available master-eligible nodes in the cluster. Altering the voting
127
- configuration involves taking a vote, so it takes some time to adjust the
128
- configuration as nodes join or leave the cluster. Also, there are situations
129
- where the most resilient configuration includes unavailable nodes, or does not
130
- include some available nodes, and in these situations the voting configuration
131
- differs from the set of available master-eligible nodes in the cluster.
132
-
133
- Larger voting configurations are usually more resilient, so Elasticsearch
134
- normally prefers to add master-eligible nodes to the voting configuration after
135
- they join the cluster. Similarly, if a node in the voting configuration
136
- leaves the cluster and there is another master-eligible node in the cluster that
137
- is not in the voting configuration then it is preferable to swap these two nodes
138
- over. The size of the voting configuration is thus unchanged but its
139
- resilience increases.
140
-
141
- It is not so straightforward to automatically remove nodes from the voting
142
- configuration after they have left the cluster. Different strategies have
143
- different benefits and drawbacks, so the right choice depends on how the cluster
144
- will be used. You can control whether the voting configuration automatically shrinks by using the following setting:
145
-
146
- `cluster.auto_shrink_voting_configuration`::
147
-
148
- Defaults to `true`, meaning that the voting configuration will automatically
149
- shrink, shedding departed nodes, as long as it still contains at least 3
150
- nodes. If set to `false`, the voting configuration never automatically
151
- shrinks; departed nodes must be removed manually using the
152
- <<modules-discovery-adding-removing-nodes,voting configuration exclusions API>>.
153
-
154
- NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true`, the
155
- recommended and default setting, and there are at least three master-eligible
156
- nodes in the cluster, then Elasticsearch remains capable of processing
157
- cluster-state updates as long as all but one of its master-eligible nodes are
158
- healthy.
159
-
160
- There are situations in which Elasticsearch might tolerate the loss of multiple
161
- nodes, but this is not guaranteed under all sequences of failures. If this
162
- setting is set to `false` then departed nodes must be removed from the voting
163
- configuration manually, using the
164
- <<modules-discovery-adding-removing-nodes,voting exclusions API>>, to achieve
165
- the desired level of resilience.
166
-
167
- No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency.
168
- The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the
169
- event of the failure of some of its nodes, and the administrative tasks that
170
- must be performed as nodes join and leave the cluster.
171
-
172
- [float]
173
- ==== Even numbers of master-eligible nodes
174
-
175
- There should normally be an odd number of master-eligible nodes in a cluster.
176
- If there is an even number, Elasticsearch leaves one of them out of the voting
177
- configuration to ensure that it has an odd size. This omission does not decrease
178
- the failure-tolerance of the cluster. In fact, improves it slightly: if the
179
- cluster suffers from a network partition that divides it into two equally-sized
180
- halves then one of the halves will contain a majority of the voting
181
- configuration and will be able to keep operating. If all of the master-eligible
182
- nodes' votes were counted, neither side would contain a strict majority of the
183
- nodes and so the cluster would not be able to make any progress.
184
-
185
- For instance if there are four master-eligible nodes in the cluster and the
186
- voting configuration contained all of them, any quorum-based decision would
187
- require votes from at least three of them. This situation means that the cluster
188
- can tolerate the loss of only a single master-eligible node. If this cluster
189
- were split into two equal halves, neither half would contain three
190
- master-eligible nodes and the cluster would not be able to make any progress.
191
- If the voting configuration contains only three of the four master-eligible
192
- nodes, however, the cluster is still only fully tolerant to the loss of one
193
- node, but quorum-based decisions require votes from two of the three voting
194
- nodes. In the event of an even split, one half will contain two of the three
195
- voting nodes so that half will remain available.
0 commit comments