Skip to content

Commit 2368b16

Browse files
committed
[DOCS] Adds overview and API ref for voting configurations
1 parent 6781a29 commit 2368b16

File tree

5 files changed

+171
-8
lines changed

5 files changed

+171
-8
lines changed

docs/reference/cluster.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,5 @@ include::cluster/tasks.asciidoc[]
104104
include::cluster/nodes-hot-threads.asciidoc[]
105105

106106
include::cluster/allocation-explain.asciidoc[]
107+
108+
include::cluster/voting-exclusions.asciidoc[]
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
[[voting-config-exclusions]]
2+
== Voting configuration exclusions API
3+
++++
4+
<titleabbrev>Voting configuration exclusions</titleabbrev>
5+
++++
6+
7+
Adds or removes nodes from the voting configuration exclusion list.
8+
9+
[float]
10+
=== Request
11+
12+
[source,js]
13+
--------------------------------------------------
14+
# Add a node to the voting configuration exclusions list
15+
POST /_cluster/voting_config_exclusions/<node_name>
16+
17+
# Remove all exclusions from the list
18+
DELETE /_cluster/voting_config_exclusions
19+
--------------------------------------------------
20+
// CONSOLE
21+
22+
[float]
23+
=== Path parameters
24+
25+
`node_name`::
26+
A <<cluster-nodes,node filter>> that identifies {es} nodes.
27+
28+
[float]
29+
=== Description
30+
31+
If the <<modules-discovery-settings,`cluster.auto_shrink_voting_configuration` setting>>
32+
is `true`, the <<modules-discovery-voting,voting configuration>> automatically
33+
shrinks when you remove master-eligible nodes from the cluster.
34+
35+
If the `cluster.auto_shrink_voting_configuration` setting is `false`, you must
36+
use this API to remove departed nodes from the voting configuration manually.
37+
It adds an entry for that node in the voting configuration exclusions list. The
38+
cluster then tries to reconfigure the voting configuration to remove that node
39+
and to prevent it from returning.
40+
41+
If the API fails, you can safely retry it. Only a successful response
42+
guarantees that the node has been removed from the voting configuration and will
43+
not be reinstated.
44+
45+
NOTE: Voting exclusions are required only when you remove at least half of the
46+
master-eligible nodes from a cluster in a short time period. They are not
47+
required when removing master-ineligible nodes or fewer than half of the
48+
master-eligible nodes.
49+
50+
The
51+
<<modules-discovery-settings,`cluster.max_voting_config_exclusions` setting>>
52+
limits the size of the voting configuration exclusion list. The default value is
53+
`10`. Since voting configuration exclusions are persistent and limited in number,
54+
you must clean up the list.
55+
56+
For more information, see <<modules-discovery-removing-nodes>>.
57+

docs/reference/modules/discovery.asciidoc

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,11 @@ module. This module is divided into the following sections:
1313
unknown, such as when a node has just started up or when the previous
1414
master has failed.
1515

16+
<<modules-discovery-quorums>>::
17+
18+
This section describes the detailed design behind the master election and
19+
auto-reconfiguration logic.
20+
1621
<<modules-discovery-bootstrap-cluster>>::
1722

1823
Bootstrapping a cluster is required when an Elasticsearch cluster starts up
@@ -39,11 +44,6 @@ module. This module is divided into the following sections:
3944

4045
Cluster state publishing is the process by which the elected master node
4146
updates the cluster state on all the other nodes in the cluster.
42-
43-
<<modules-discovery-quorums>>::
44-
45-
This section describes the detailed design behind the master election and
46-
auto-reconfiguration logic.
4747

4848
<<modules-discovery-settings,Settings>>::
4949

@@ -52,14 +52,16 @@ module. This module is divided into the following sections:
5252

5353
include::discovery/discovery.asciidoc[]
5454

55+
include::discovery/quorums.asciidoc[]
56+
57+
include::discovery/voting.asciidoc[]
58+
5559
include::discovery/bootstrapping.asciidoc[]
5660

5761
include::discovery/adding-removing-nodes.asciidoc[]
5862

5963
include::discovery/publishing.asciidoc[]
6064

61-
include::discovery/quorums.asciidoc[]
62-
6365
include::discovery/fault-detection.asciidoc[]
6466

6567
include::discovery/discovery-settings.asciidoc[]

docs/reference/modules/discovery/adding-removing-nodes.asciidoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ cluster, and to scale the cluster up and down by adding and removing
1212
master-ineligible nodes only. However there are situations in which it may be
1313
desirable to add or remove some master-eligible nodes to or from a cluster.
1414

15+
[[modules-discovery-adding-nodes]]
1516
==== Adding master-eligible nodes
1617

1718
If you wish to add some nodes to your cluster, simply configure the new nodes
@@ -24,6 +25,7 @@ cluster. You can use the `cluster.join.timeout` setting to configure how long a
2425
node waits after sending a request to join a cluster. Its default value is `30s`.
2526
See <<modules-discovery-settings>>.
2627

28+
[[modules-discovery-removing-nodes]]
2729
==== Removing master-eligible nodes
2830

2931
When removing master-eligible nodes, it is important not to remove too many all
@@ -50,7 +52,7 @@ will never automatically move a node on the voting exclusions list back into the
5052
voting configuration. Once an excluded node has been successfully
5153
auto-reconfigured out of the voting configuration, it is safe to shut it down
5254
without affecting the cluster's master-level availability. A node can be added
53-
to the voting configuration exclusion list using the following API:
55+
to the voting configuration exclusion list using the <<voting-config-exclusions>> API. For example:
5456

5557
[source,js]
5658
--------------------------------------------------
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
[[modules-discovery-voting]]
2+
=== Voting configurations
3+
4+
Each {es} cluster has a _voting configuration_, which is the set of
5+
<<master-node,master-eligible nodes>> whose responses are counted when making
6+
decisions such as electing a new master or committing a new cluster
7+
state. Decisions are made only after a _quorum_ (more than half) of the nodes in
8+
the voting configuration respond.
9+
10+
Usually the voting configuration is the same as the set of all the
11+
master-eligible nodes that are currently in the cluster. However, there are some
12+
situations in which they may be different.
13+
14+
IMPORTANT: To ensure the cluster remains available, you **must not stop half or
15+
more of the nodes in the voting configuration at the same time**. As long as more
16+
than half of the voting nodes are available, the cluster can work normally. For
17+
example, if there are three or four master-eligible nodes, the cluster
18+
can tolerate one unavailable node. If there are two or fewer master-eligible
19+
nodes, they must all remain available.
20+
21+
After a node joins or leaves the cluster, {es} reacts by automatically making
22+
corresponding changes to the voting configuration in order to ensure that the
23+
cluster is as resilient as possible. It is important to wait for this adjustment
24+
to complete before you remove more nodes from the cluster. For more information,
25+
see <<modules-discovery-adding-removing-nodes>>.
26+
27+
The current voting configuration is stored in the cluster state so you can
28+
inspect its current contents as follows:
29+
30+
[source,js]
31+
--------------------------------------------------
32+
GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config
33+
--------------------------------------------------
34+
// CONSOLE
35+
36+
NOTE: The current voting configuration is not necessarily the same as the set of
37+
all available master-eligible nodes in the cluster. Altering the voting
38+
configuration involves taking a vote, so it takes some time to adjust the
39+
configuration as nodes join or leave the cluster. Also, there are situations
40+
where the most resilient configuration includes unavailable nodes, or does not
41+
include some available nodes, and in these situations the voting configuration
42+
differs from the set of available master-eligible nodes in the cluster.
43+
44+
Larger voting configurations are usually more resilient, so Elasticsearch
45+
normally prefers to add master-eligible nodes to the voting configuration after
46+
they join the cluster. Similarly, if a node in the voting configuration
47+
leaves the cluster and there is another master-eligible node in the cluster that
48+
is not in the voting configuration then it is preferable to swap these two nodes
49+
over. The size of the voting configuration is thus unchanged but its
50+
resilience increases.
51+
52+
It is not so straightforward to automatically remove nodes from the voting
53+
configuration after they have left the cluster. Different strategies have
54+
different benefits and drawbacks, so the right choice depends on how the cluster
55+
will be used. You can control whether the voting configuration automatically
56+
shrinks by using the
57+
<<modules-discovery-settings,`cluster.auto_shrink_voting_configuration` setting>>.
58+
59+
NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true`, the
60+
recommended and default setting, and there are at least three master-eligible
61+
nodes in the cluster, Elasticsearch remains capable of processing cluster state
62+
updates as long as all but one of its master-eligible nodes are healthy.
63+
64+
There are situations in which Elasticsearch might tolerate the loss of multiple
65+
nodes, but this is not guaranteed under all sequences of failures. If the
66+
`cluster.auto_shrink_voting_configuration` setting is `false`, you must remove
67+
departed nodes from the voting configuration manually. Use the
68+
<<voting-config-exclusions,voting exclusions API>> to achieve the desired level
69+
of resilience.
70+
71+
No matter how it is configured, Elasticsearch will not suffer from a
72+
"split-brain" inconsistency. The `cluster.auto_shrink_voting_configuration`
73+
setting affects only its availability in the event of the failure of some of its
74+
nodes, and the administrative tasks that must be performed as nodes join and
75+
leave the cluster.
76+
77+
[float]
78+
==== Even numbers of master-eligible nodes
79+
80+
There should normally be an odd number of master-eligible nodes in a cluster.
81+
If there is an even number, Elasticsearch leaves one of them out of the voting
82+
configuration to ensure that it has an odd size. This omission does not decrease
83+
the failure-tolerance of the cluster. In fact, improves it slightly: if the
84+
cluster suffers from a network partition that divides it into two equally-sized
85+
halves then one of the halves will contain a majority of the voting
86+
configuration and will be able to keep operating. If all of the votes from
87+
master-eligible nodes were counted, neither side would contain a strict majority
88+
of the nodes and so the cluster would not be able to make any progress.
89+
90+
For instance if there are four master-eligible nodes in the cluster and the
91+
voting configuration contained all of them, any quorum-based decision would
92+
require votes from at least three of them. This situation means that the cluster
93+
can tolerate the loss of only a single master-eligible node. If this cluster
94+
were split into two equal halves, neither half would contain three
95+
master-eligible nodes and the cluster would not be able to make any progress.
96+
If the voting configuration contains only three of the four master-eligible
97+
nodes, however, the cluster is still only fully tolerant to the loss of one
98+
node, but quorum-based decisions require votes from two of the three voting
99+
nodes. In the event of an even split, one half will contain two of the three
100+
voting nodes so that half will remain available.

0 commit comments

Comments
 (0)