Skip to content

Commit 1a23417

Browse files
authored
[Zen2] Update documentation for Zen2 (#34714)
This commit overhauls the documentation of discovery and cluster coordination, removing mention of the Zen Discovery module and replacing it with docs for the new cluster coordination mechanism introduced in 7.0. Relates #32006
1 parent 08bcd83 commit 1a23417

27 files changed

+985
-414
lines changed

Diff for: docs/plugins/discovery.asciidoc

+8-6
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
[[discovery]]
22
== Discovery Plugins
33

4-
Discovery plugins extend Elasticsearch by adding new discovery mechanisms that
5-
can be used instead of {ref}/modules-discovery-zen.html[Zen Discovery].
4+
Discovery plugins extend Elasticsearch by adding new hosts providers that can be
5+
used to extend the {ref}/modules-discovery.html[cluster formation module].
66

77
[float]
88
==== Core discovery plugins
@@ -11,22 +11,24 @@ The core discovery plugins are:
1111

1212
<<discovery-ec2,EC2 discovery>>::
1313

14-
The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API] for unicast discovery.
14+
The EC2 discovery plugin uses the https://github.com/aws/aws-sdk-java[AWS API]
15+
for unicast discovery.
1516

1617
<<discovery-azure-classic,Azure Classic discovery>>::
1718

18-
The Azure Classic discovery plugin uses the Azure Classic API for unicast discovery.
19+
The Azure Classic discovery plugin uses the Azure Classic API for unicast
20+
discovery.
1921

2022
<<discovery-gce,GCE discovery>>::
2123

22-
The Google Compute Engine discovery plugin uses the GCE API for unicast discovery.
24+
The Google Compute Engine discovery plugin uses the GCE API for unicast
25+
discovery.
2326

2427
[float]
2528
==== Community contributed discovery plugins
2629

2730
A number of discovery plugins have been contributed by our community:
2831

29-
* https://github.com/shikhar/eskka[eskka Discovery Plugin] (by Shikhar Bhushan)
3032
* https://github.com/fabric8io/elasticsearch-cloud-kubernetes[Kubernetes Discovery Plugin] (by Jimmi Dyson, http://fabric8.io[fabric8])
3133

3234
include::discovery-ec2.asciidoc[]

Diff for: docs/reference/migration/migrate_7_0.asciidoc

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ See also <<release-highlights>> and <<es-release-notes>>.
1111

1212
* <<breaking_70_aggregations_changes>>
1313
* <<breaking_70_cluster_changes>>
14+
* <<breaking_70_discovery_changes>>
1415
* <<breaking_70_indices_changes>>
1516
* <<breaking_70_mappings_changes>>
1617
* <<breaking_70_search_changes>>
@@ -44,6 +45,7 @@ Elasticsearch 6.x in order to be readable by Elasticsearch 7.x.
4445
include::migrate_7_0/aggregations.asciidoc[]
4546
include::migrate_7_0/analysis.asciidoc[]
4647
include::migrate_7_0/cluster.asciidoc[]
48+
include::migrate_7_0/discovery.asciidoc[]
4749
include::migrate_7_0/indices.asciidoc[]
4850
include::migrate_7_0/mappings.asciidoc[]
4951
include::migrate_7_0/search.asciidoc[]

Diff for: docs/reference/migration/migrate_7_0/cluster.asciidoc

-9
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,3 @@ Clusters now have soft limits on the total number of open shards in the cluster
2525
based on the number of nodes and the `cluster.max_shards_per_node` cluster
2626
setting, to prevent accidental operations that would destabilize the cluster.
2727
More information can be found in the <<misc-cluster,documentation for that setting>>.
28-
29-
[float]
30-
==== Discovery configuration is required in production
31-
Production deployments of Elasticsearch now require at least one of the following settings
32-
to be specified in the `elasticsearch.yml` configuration file:
33-
34-
- `discovery.zen.ping.unicast.hosts`
35-
- `discovery.zen.hosts_provider`
36-
- `cluster.initial_master_nodes`
+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
[float]
2+
[[breaking_70_discovery_changes]]
3+
=== Discovery changes
4+
5+
[float]
6+
==== Cluster bootstrapping is required if discovery is configured
7+
8+
The first time a cluster is started, `cluster.initial_master_nodes` must be set
9+
to perform cluster bootstrapping. It should contain the names of the
10+
master-eligible nodes in the initial cluster and be defined on every
11+
master-eligible node in the cluster. See <<discovery-settings,the discovery
12+
settings summary>> for an example, and the
13+
<<modules-discovery-bootstrap-cluster,cluster bootstrapping reference
14+
documentation>> describes this setting in more detail.
15+
16+
The `discovery.zen.minimum_master_nodes` setting is required during a rolling
17+
upgrade from 6.x, but can be removed in all other circumstances.
18+
19+
[float]
20+
==== Removing master-eligible nodes sometimes requires voting exclusions
21+
22+
If you wish to remove half or more of the master-eligible nodes from a cluster,
23+
you must first exclude the affected nodes from the voting configuration using
24+
the <<modules-discovery-adding-removing-nodes,voting config exclusions API>>.
25+
If you remove fewer than half of the master-eligible nodes at the same time,
26+
voting exclusions are not required. If you remove only master-ineligible nodes
27+
such as data-only nodes or coordinating-only nodes, voting exclusions are not
28+
required. Likewise, if you add nodes to the cluster, voting exclusions are not
29+
required.
30+
31+
[float]
32+
==== Discovery configuration is required in production
33+
34+
Production deployments of Elasticsearch now require at least one of the
35+
following settings to be specified in the `elasticsearch.yml` configuration
36+
file:
37+
38+
- `discovery.zen.ping.unicast.hosts`
39+
- `discovery.zen.hosts_provider`
40+
- `cluster.initial_master_nodes`

Diff for: docs/reference/modules.asciidoc

+6-6
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ These settings can be dynamically updated on a live cluster with the
1818

1919
The modules in this section are:
2020

21-
<<modules-cluster,Cluster-level routing and shard allocation>>::
21+
<<modules-discovery,Discovery and cluster formation>>::
2222

23-
Settings to control where, when, and how shards are allocated to nodes.
23+
How nodes discover each other, elect a master and form a cluster.
2424

25-
<<modules-discovery,Discovery>>::
25+
<<modules-cluster,Shard allocation and cluster-level routing>>::
2626

27-
How nodes discover each other to form a cluster.
27+
Settings to control where, when, and how shards are allocated to nodes.
2828

2929
<<modules-gateway,Gateway>>::
3030

@@ -85,10 +85,10 @@ The modules in this section are:
8585
--
8686

8787

88-
include::modules/cluster.asciidoc[]
89-
9088
include::modules/discovery.asciidoc[]
9189

90+
include::modules/cluster.asciidoc[]
91+
9292
include::modules/gateway.asciidoc[]
9393

9494
include::modules/http.asciidoc[]

Diff for: docs/reference/modules/cluster.asciidoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[[modules-cluster]]
2-
== Cluster
2+
== Shard allocation and cluster-level routing
33

44
One of the main roles of the master is to decide which shards to allocate to
55
which nodes, and when to move shards between nodes in order to rebalance the

Diff for: docs/reference/modules/discovery.asciidoc

+66-21
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,75 @@
11
[[modules-discovery]]
2-
== Discovery
2+
== Discovery and cluster formation
33

4-
The discovery module is responsible for discovering nodes within a
5-
cluster, as well as electing a master node.
4+
The discovery and cluster formation module is responsible for discovering
5+
nodes, electing a master, forming a cluster, and publishing the cluster state
6+
each time it changes. It is integrated with other modules. For example, all
7+
communication between nodes is done using the <<modules-transport,transport>>
8+
module. This module is divided into the following sections:
69

7-
Note, Elasticsearch is a peer to peer based system, nodes communicate
8-
with one another directly if operations are delegated / broadcast. All
9-
the main APIs (index, delete, search) do not communicate with the master
10-
node. The responsibility of the master node is to maintain the global
11-
cluster state, and act if nodes join or leave the cluster by reassigning
12-
shards. Each time a cluster state is changed, the state is made known to
13-
the other nodes in the cluster (the manner depends on the actual
14-
discovery implementation).
10+
<<modules-discovery-hosts-providers>>::
1511

16-
[float]
17-
=== Settings
12+
Discovery is the process where nodes find each other when the master is
13+
unknown, such as when a node has just started up or when the previous
14+
master has failed.
1815

19-
The `cluster.name` allows to create separated clusters from one another.
20-
The default value for the cluster name is `elasticsearch`, though it is
21-
recommended to change this to reflect the logical group name of the
22-
cluster running.
16+
<<modules-discovery-bootstrap-cluster>>::
2317

24-
include::discovery/azure.asciidoc[]
18+
Bootstrapping a cluster is required when an Elasticsearch cluster starts up
19+
for the very first time. In <<dev-vs-prod-mode,development mode>>, with no
20+
discovery settings configured, this is automatically performed by the nodes
21+
themselves. As this auto-bootstrapping is
22+
<<modules-discovery-quorums,inherently unsafe>>, running a node in
23+
<<dev-vs-prod-mode,production mode>> requires bootstrapping to be
24+
explicitly configured via the
25+
<<modules-discovery-bootstrap-cluster,`cluster.initial_master_nodes`
26+
setting>>.
2527

26-
include::discovery/ec2.asciidoc[]
28+
<<modules-discovery-adding-removing-nodes,Adding and removing master-eligible nodes>>::
2729

28-
include::discovery/gce.asciidoc[]
30+
It is recommended to have a small and fixed number of master-eligible nodes
31+
in a cluster, and to scale the cluster up and down by adding and removing
32+
master-ineligible nodes only. However there are situations in which it may
33+
be desirable to add or remove some master-eligible nodes to or from a
34+
cluster. This section describes the process for adding or removing
35+
master-eligible nodes, including the extra steps that need to be performed
36+
when removing more than half of the master-eligible nodes at the same time.
37+
38+
<<cluster-state-publishing>>::
39+
40+
Cluster state publishing is the process by which the elected master node
41+
updates the cluster state on all the other nodes in the cluster.
42+
43+
<<no-master-block>>::
44+
45+
The no-master block is put in place when there is no known elected master,
46+
and can be configured to determine which operations should be rejected when
47+
it is in place.
48+
49+
Advanced settings::
50+
51+
There are settings that allow advanced users to influence the
52+
<<master-election-settings,master election>> and
53+
<<fault-detection-settings,fault detection>> processes.
54+
55+
<<modules-discovery-quorums>>::
56+
57+
This section describes the detailed design behind the master election and
58+
auto-reconfiguration logic.
59+
60+
include::discovery/discovery.asciidoc[]
61+
62+
include::discovery/bootstrapping.asciidoc[]
63+
64+
include::discovery/adding-removing-nodes.asciidoc[]
65+
66+
include::discovery/publishing.asciidoc[]
67+
68+
include::discovery/no-master-block.asciidoc[]
69+
70+
include::discovery/master-election.asciidoc[]
71+
72+
include::discovery/fault-detection.asciidoc[]
73+
74+
include::discovery/quorums.asciidoc[]
2975

30-
include::discovery/zen.asciidoc[]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
[[modules-discovery-adding-removing-nodes]]
2+
=== Adding and removing nodes
3+
4+
As nodes are added or removed Elasticsearch maintains an optimal level of fault
5+
tolerance by automatically updating the cluster's _voting configuration_, which
6+
is the set of <<master-node,master-eligible nodes>> whose responses are counted
7+
when making decisions such as electing a new master or committing a new cluster
8+
state.
9+
10+
It is recommended to have a small and fixed number of master-eligible nodes in a
11+
cluster, and to scale the cluster up and down by adding and removing
12+
master-ineligible nodes only. However there are situations in which it may be
13+
desirable to add or remove some master-eligible nodes to or from a cluster.
14+
15+
==== Adding master-eligible nodes
16+
17+
If you wish to add some master-eligible nodes to your cluster, simply configure
18+
the new nodes to find the existing cluster and start them up. Elasticsearch will
19+
add the new nodes to the voting configuration if it is appropriate to do so.
20+
21+
==== Removing master-eligible nodes
22+
23+
When removing master-eligible nodes, it is important not to remove too many all
24+
at the same time. For instance, if there are currently seven master-eligible
25+
nodes and you wish to reduce this to three, it is not possible simply to stop
26+
four of the nodes at once: to do so would leave only three nodes remaining,
27+
which is less than half of the voting configuration, which means the cluster
28+
cannot take any further actions.
29+
30+
As long as there are at least three master-eligible nodes in the cluster, as a
31+
general rule it is best to remove nodes one-at-a-time, allowing enough time for
32+
the cluster to <<modules-discovery-quorums,automatically adjust>> the voting
33+
configuration and adapt the fault tolerance level to the new set of nodes.
34+
35+
If there are only two master-eligible nodes remaining then neither node can be
36+
safely removed since both are required to reliably make progress. You must first
37+
inform Elasticsearch that one of the nodes should not be part of the voting
38+
configuration, and that the voting power should instead be given to other nodes.
39+
You can then take the excluded node offline without preventing the other node
40+
from making progress. A node which is added to a voting configuration exclusion
41+
list still works normally, but Elasticsearch tries to remove it from the voting
42+
configuration so its vote is no longer required. Importantly, Elasticsearch
43+
will never automatically move a node on the voting exclusions list back into the
44+
voting configuration. Once an excluded node has been successfully
45+
auto-reconfigured out of the voting configuration, it is safe to shut it down
46+
without affecting the cluster's master-level availability. A node can be added
47+
to the voting configuration exclusion list using the following API:
48+
49+
[source,js]
50+
--------------------------------------------------
51+
# Add node to voting configuration exclusions list and wait for the system to
52+
# auto-reconfigure the node out of the voting configuration up to the default
53+
# timeout of 30 seconds
54+
POST /_cluster/voting_config_exclusions/node_name
55+
56+
# Add node to voting configuration exclusions list and wait for
57+
# auto-reconfiguration up to one minute
58+
POST /_cluster/voting_config_exclusions/node_name?timeout=1m
59+
--------------------------------------------------
60+
// CONSOLE
61+
// TEST[skip:this would break the test cluster if executed]
62+
63+
The node that should be added to the exclusions list is specified using
64+
<<cluster-nodes,node filters>> in place of `node_name` here. If a call to the
65+
voting configuration exclusions API fails, you can safely retry it. Only a
66+
successful response guarantees that the node has actually been removed from the
67+
voting configuration and will not be reinstated.
68+
69+
Although the voting configuration exclusions API is most useful for down-scaling
70+
a two-node to a one-node cluster, it is also possible to use it to remove
71+
multiple master-eligible nodes all at the same time. Adding multiple nodes to
72+
the exclusions list has the system try to auto-reconfigure all of these nodes
73+
out of the voting configuration, allowing them to be safely shut down while
74+
keeping the cluster available. In the example described above, shrinking a
75+
seven-master-node cluster down to only have three master nodes, you could add
76+
four nodes to the exclusions list, wait for confirmation, and then shut them
77+
down simultaneously.
78+
79+
NOTE: Voting exclusions are only required when removing at least half of the
80+
master-eligible nodes from a cluster in a short time period. They are not
81+
required when removing master-ineligible nodes, nor are they required when
82+
removing fewer than half of the master-eligible nodes.
83+
84+
Adding an exclusion for a node creates an entry for that node in the voting
85+
configuration exclusions list, which has the system automatically try to
86+
reconfigure the voting configuration to remove that node and prevents it from
87+
returning to the voting configuration once it has removed. The current list of
88+
exclusions is stored in the cluster state and can be inspected as follows:
89+
90+
[source,js]
91+
--------------------------------------------------
92+
GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions
93+
--------------------------------------------------
94+
// CONSOLE
95+
96+
This list is limited in size by the following setting:
97+
98+
`cluster.max_voting_config_exclusions`::
99+
100+
Sets a limits on the number of voting configuration exclusions at any one
101+
time. Defaults to `10`.
102+
103+
Since voting configuration exclusions are persistent and limited in number, they
104+
must be cleaned up. Normally an exclusion is added when performing some
105+
maintenance on the cluster, and the exclusions should be cleaned up when the
106+
maintenance is complete. Clusters should have no voting configuration exclusions
107+
in normal operation.
108+
109+
If a node is excluded from the voting configuration because it is to be shut
110+
down permanently, its exclusion can be removed after it is shut down and removed
111+
from the cluster. Exclusions can also be cleared if they were created in error
112+
or were only required temporarily:
113+
114+
[source,js]
115+
--------------------------------------------------
116+
# Wait for all the nodes with voting configuration exclusions to be removed from
117+
# the cluster and then remove all the exclusions, allowing any node to return to
118+
# the voting configuration in the future.
119+
DELETE /_cluster/voting_config_exclusions
120+
121+
# Immediately remove all the voting configuration exclusions, allowing any node
122+
# to return to the voting configuration in the future.
123+
DELETE /_cluster/voting_config_exclusions?wait_for_removal=false
124+
--------------------------------------------------
125+
// CONSOLE

Diff for: docs/reference/modules/discovery/azure.asciidoc

-5
This file was deleted.

0 commit comments

Comments
 (0)