-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[Zen2] Update documentation for Zen2 #34714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
621774a
56d050f
aa6df51
7c9db23
830eca7
d91c924
d03103a
c762dba
8ca2e75
d8ec40b
a498e42
eb0aa2f
ebcbe3c
27a9ffb
f43414d
1fef44e
10020db
05dc68a
529a94a
dd35159
40649bd
e88656c
697bbca
8de22f1
b54c0c1
cb38a39
dfd64f9
0973de8
e8d9656
208d463
de98cba
12c2b4b
d4763ee
09b6293
3fc691f
ca73f1f
01e7555
f3a8b93
e10d760
ff1e87c
024c9b2
02b607c
60d64b4
6102d5b
9fa0844
fb1e7d3
bfc7d16
2ed2c39
68d9ef5
a2b4d38
bb6ef8e
cbd33ff
b91519c
7540e4e
d04b7ad
8635a41
1189440
7c7e7af
d48eccc
43a6dcc
e6087e9
02b7ebd
7714003
dddc3cf
1888c97
b8997b1
b1e98bd
a1c9843
4b40b34
e6b7401
041494c
f438a28
4180e00
76ec76c
8d1b118
2df3878
17be8bb
7ca6cc8
2fdb92f
c4fd335
1d69b0a
9003c04
771cf61
14de23c
58c2a52
98f1485
ebe1a1f
2cac91f
b4dd874
9d787f9
400b2e4
802a413
0ea5488
00a0145
9cdc18f
a9848ab
4b55b1e
6778c0a
b349806
ccba8ba
8835ef2
0643929
dea0d59
442a7a7
14194c6
e466ed0
8e34a77
ec4e739
f4a41db
6af3721
0b2b63c
f24c1d9
64564c0
852caed
94bc24b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
[float] | ||
[[breaking_70_discovery_changes]] | ||
=== Discovery changes | ||
|
||
[float] | ||
==== Cluster bootstrapping is required if discovery is configured | ||
|
||
The first time a cluster is started, `cluster.initial_master_nodes` must be set | ||
to perform cluster bootstrapping. It should contain the names of the | ||
master-eligible nodes in the initial cluster and be defined on every | ||
master-eligible node in the cluster. See <<discovery-settings,the discovery | ||
settings summary>> for an example, and the | ||
<<modules-discovery-bootstrap-cluster,cluster bootstrapping reference | ||
DaveCTurner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
documentation>> describes this setting in more detail. | ||
|
||
The `discovery.zen.minimum_master_nodes` setting is required during a rolling | ||
upgrade from 6.x, but can be removed in all other circumstances. | ||
|
||
[float] | ||
==== Removing master-eligible nodes sometimes requires voting exclusions | ||
|
||
If you wish to remove half or more of the master-eligible nodes from a cluster, | ||
you must first exclude the affected nodes from the voting configuration using | ||
the <<modules-discovery-adding-removing-nodes,voting config exclusions API>>. | ||
If you remove fewer than half of the master-eligible nodes at the same time, | ||
voting exclusions are not required. If you remove only master-ineligible nodes | ||
such as data-only nodes or coordinating-only nodes, voting exclusions are not | ||
required. Likewise, if you add nodes to the cluster, voting exclusions are not | ||
required. | ||
|
||
[float] | ||
==== Discovery configuration is required in production | ||
|
||
Production deployments of Elasticsearch now require at least one of the | ||
following settings to be specified in the `elasticsearch.yml` configuration | ||
file: | ||
|
||
- `discovery.zen.ping.unicast.hosts` | ||
- `discovery.zen.hosts_provider` | ||
- `cluster.initial_master_nodes` |
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
@@ -1,30 +1,75 @@ | ||||
[[modules-discovery]] | ||||
== Discovery | ||||
== Discovery and cluster formation | ||||
andrershov marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
|
||||
The discovery module is responsible for discovering nodes within a | ||||
cluster, as well as electing a master node. | ||||
The discovery and cluster formation module is responsible for discovering | ||||
nodes, electing a master, forming a cluster, and publishing the cluster state | ||||
each time it changes. It is integrated with other modules. For example, all | ||||
communication between nodes is done using the <<modules-transport,transport>> | ||||
module. This module is divided into the following sections: | ||||
|
||||
Note, Elasticsearch is a peer to peer based system, nodes communicate | ||||
with one another directly if operations are delegated / broadcast. All | ||||
the main APIs (index, delete, search) do not communicate with the master | ||||
node. The responsibility of the master node is to maintain the global | ||||
cluster state, and act if nodes join or leave the cluster by reassigning | ||||
shards. Each time a cluster state is changed, the state is made known to | ||||
the other nodes in the cluster (the manner depends on the actual | ||||
discovery implementation). | ||||
<<modules-discovery-hosts-providers>>:: | ||||
|
||||
[float] | ||||
=== Settings | ||||
Discovery is the process where nodes find each other when the master is | ||||
unknown, such as when a node has just started up or when the previous | ||||
master has failed. | ||||
|
||||
The `cluster.name` allows to create separated clusters from one another. | ||||
The default value for the cluster name is `elasticsearch`, though it is | ||||
recommended to change this to reflect the logical group name of the | ||||
cluster running. | ||||
<<modules-discovery-bootstrap-cluster>>:: | ||||
|
||||
include::discovery/azure.asciidoc[] | ||||
Bootstrapping a cluster is required when an Elasticsearch cluster starts up | ||||
for the very first time. In <<dev-vs-prod-mode,development mode>>, with no | ||||
discovery settings configured, this is automatically performed by the nodes | ||||
themselves. As this auto-bootstrapping is | ||||
<<modules-discovery-quorums,inherently unsafe>>, running a node in | ||||
<<dev-vs-prod-mode,production mode>> requires bootstrapping to be | ||||
explicitly configured via the | ||||
<<modules-discovery-bootstrap-cluster,`cluster.initial_master_nodes` | ||||
setting>>. | ||||
|
||||
include::discovery/ec2.asciidoc[] | ||||
<<modules-discovery-adding-removing-nodes,Adding and removing master-eligible nodes>>:: | ||||
|
||||
include::discovery/gce.asciidoc[] | ||||
It is recommended to have a small and fixed number of master-eligible nodes | ||||
in a cluster, and to scale the cluster up and down by adding and removing | ||||
master-ineligible nodes only. However there are situations in which it may | ||||
be desirable to add or remove some master-eligible nodes to or from a | ||||
cluster. This section describes the process for adding or removing | ||||
master-eligible nodes, including the extra steps that need to be performed | ||||
when removing more than half of the master-eligible nodes at the same time. | ||||
|
||||
<<cluster-state-publishing>>:: | ||||
|
||||
Cluster state publishing is the process by which the elected master node | ||||
updates the cluster state on all the other nodes in the cluster. | ||||
|
||||
<<no-master-block>>:: | ||||
|
||||
The no-master block is put in place when there is no known elected master, | ||||
and can be configured to determine which operations should be rejected when | ||||
it is in place. | ||||
|
||||
Advanced settings:: | ||||
|
||||
There are settings that allow advanced users to influence the | ||||
<<master-election-settings,master election>> and | ||||
<<fault-detection-settings,fault detection>> processes. | ||||
|
||||
<<modules-discovery-quorums>>:: | ||||
|
||||
This section describes the detailed design behind the master election and | ||||
auto-reconfiguration logic. | ||||
|
||||
include::discovery/discovery.asciidoc[] | ||||
|
||||
include::discovery/bootstrapping.asciidoc[] | ||||
|
||||
include::discovery/adding-removing-nodes.asciidoc[] | ||||
|
||||
include::discovery/publishing.asciidoc[] | ||||
|
||||
include::discovery/no-master-block.asciidoc[] | ||||
|
||||
include::discovery/master-election.asciidoc[] | ||||
|
||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be great to have the following example here. Consider you have 3 master eligible nodes - A, B, C and auto_shrink is set to true. In this case, the voting configuration will be {A, B, C}. Now consider node C fails, the voting configuration is not changed in this case, because there would be less than 3 nodes if node C is removed. Now master-eligible node D connects to the cluster, in this case, node C will be atomically replaced with node D in the voting configuration - {A, B, D}. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should wait and see about this. I am worried that introducing this one example will raise more questions than it answers, and do not want to introduce a much broader selection of examples. This particular example is spelled out here: elasticsearch/server/src/test/java/org/elasticsearch/cluster/coordination/ReconfiguratorTests.java Line 63 in a056bd8
However as you can see from that test case there are many other examples to think about. |
||||
include::discovery/fault-detection.asciidoc[] | ||||
|
||||
include::discovery/quorums.asciidoc[] | ||||
|
||||
include::discovery/zen.asciidoc[] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
[[modules-discovery-adding-removing-nodes]] | ||
=== Adding and removing nodes | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like the majority of this information pertains only to removing nodes. It's not necessary immediately, but at some point I think it would be good to split this into two separate pages -- one about adding nodes (with lots of details about how that differs depending on platform and node type, etc) and one about removing nodes (which would be most of this content). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There just isn't really a lot to say about adding nodes in this context. I added headings to divide the page up in e466ed0. |
||
As nodes are added or removed Elasticsearch maintains an optimal level of fault | ||
tolerance by automatically updating the cluster's _voting configuration_, which | ||
is the set of <<master-node,master-eligible nodes>> whose responses are counted | ||
when making decisions such as electing a new master or committing a new cluster | ||
state. | ||
|
||
It is recommended to have a small and fixed number of master-eligible nodes in a | ||
cluster, and to scale the cluster up and down by adding and removing | ||
master-ineligible nodes only. However there are situations in which it may be | ||
desirable to add or remove some master-eligible nodes to or from a cluster. | ||
|
||
==== Adding master-eligible nodes | ||
|
||
If you wish to add some master-eligible nodes to your cluster, simply configure | ||
the new nodes to find the existing cluster and start them up. Elasticsearch will | ||
add the new nodes to the voting configuration if it is appropriate to do so. | ||
|
||
==== Removing master-eligible nodes | ||
|
||
When removing master-eligible nodes, it is important not to remove too many all | ||
at the same time. For instance, if there are currently seven master-eligible | ||
nodes and you wish to reduce this to three, it is not possible simply to stop | ||
four of the nodes at once: to do so would leave only three nodes remaining, | ||
which is less than half of the voting configuration, which means the cluster | ||
cannot take any further actions. | ||
|
||
As long as there are at least three master-eligible nodes in the cluster, as a | ||
general rule it is best to remove nodes one-at-a-time, allowing enough time for | ||
the cluster to <<modules-discovery-quorums,automatically adjust>> the voting | ||
configuration and adapt the fault tolerance level to the new set of nodes. | ||
|
||
If there are only two master-eligible nodes remaining then neither node can be | ||
safely removed since both are required to reliably make progress. You must first | ||
inform Elasticsearch that one of the nodes should not be part of the voting | ||
configuration, and that the voting power should instead be given to other nodes. | ||
You can then take the excluded node offline without preventing the other node | ||
from making progress. A node which is added to a voting configuration exclusion | ||
list still works normally, but Elasticsearch tries to remove it from the voting | ||
configuration so its vote is no longer required. Importantly, Elasticsearch | ||
will never automatically move a node on the voting exclusions list back into the | ||
voting configuration. Once an excluded node has been successfully | ||
auto-reconfigured out of the voting configuration, it is safe to shut it down | ||
without affecting the cluster's master-level availability. A node can be added | ||
to the voting configuration exclusion list using the following API: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
# Add node to voting configuration exclusions list and wait for the system to | ||
# auto-reconfigure the node out of the voting configuration up to the default | ||
# timeout of 30 seconds | ||
POST /_cluster/voting_config_exclusions/node_name | ||
|
||
# Add node to voting configuration exclusions list and wait for | ||
# auto-reconfiguration up to one minute | ||
POST /_cluster/voting_config_exclusions/node_name?timeout=1m | ||
-------------------------------------------------- | ||
// CONSOLE | ||
// TEST[skip:this would break the test cluster if executed] | ||
|
||
The node that should be added to the exclusions list is specified using | ||
<<cluster-nodes,node filters>> in place of `node_name` here. If a call to the | ||
voting configuration exclusions API fails, you can safely retry it. Only a | ||
successful response guarantees that the node has actually been removed from the | ||
voting configuration and will not be reinstated. | ||
|
||
Although the voting configuration exclusions API is most useful for down-scaling | ||
a two-node to a one-node cluster, it is also possible to use it to remove | ||
multiple master-eligible nodes all at the same time. Adding multiple nodes to | ||
the exclusions list has the system try to auto-reconfigure all of these nodes | ||
out of the voting configuration, allowing them to be safely shut down while | ||
keeping the cluster available. In the example described above, shrinking a | ||
seven-master-node cluster down to only have three master nodes, you could add | ||
four nodes to the exclusions list, wait for confirmation, and then shut them | ||
down simultaneously. | ||
|
||
NOTE: Voting exclusions are only required when removing at least half of the | ||
master-eligible nodes from a cluster in a short time period. They are not | ||
required when removing master-ineligible nodes, nor are they required when | ||
removing fewer than half of the master-eligible nodes. | ||
|
||
Adding an exclusion for a node creates an entry for that node in the voting | ||
configuration exclusions list, which has the system automatically try to | ||
reconfigure the voting configuration to remove that node and prevents it from | ||
returning to the voting configuration once it has removed. The current list of | ||
exclusions is stored in the cluster state and can be inspected as follows: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions | ||
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
This list is limited in size by the following setting: | ||
|
||
`cluster.max_voting_config_exclusions`:: | ||
|
||
Sets a limits on the number of voting configuration exclusions at any one | ||
time. Defaults to `10`. | ||
|
||
Since voting configuration exclusions are persistent and limited in number, they | ||
must be cleaned up. Normally an exclusion is added when performing some | ||
maintenance on the cluster, and the exclusions should be cleaned up when the | ||
maintenance is complete. Clusters should have no voting configuration exclusions | ||
in normal operation. | ||
|
||
If a node is excluded from the voting configuration because it is to be shut | ||
down permanently, its exclusion can be removed after it is shut down and removed | ||
from the cluster. Exclusions can also be cleared if they were created in error | ||
or were only required temporarily: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
# Wait for all the nodes with voting configuration exclusions to be removed from | ||
# the cluster and then remove all the exclusions, allowing any node to return to | ||
# the voting configuration in the future. | ||
DELETE /_cluster/voting_config_exclusions | ||
|
||
# Immediately remove all the voting configuration exclusions, allowing any node | ||
# to return to the voting configuration in the future. | ||
DELETE /_cluster/voting_config_exclusions?wait_for_removal=false | ||
-------------------------------------------------- | ||
// CONSOLE |
This file was deleted.
Uh oh!
There was an error while loading. Please reload this page.