From 2368b169702e7aec56166b71fd920035d8a59cb9 Mon Sep 17 00:00:00 2001 From: lcawl Date: Fri, 21 Dec 2018 16:09:31 -0800 Subject: [PATCH 1/7] [DOCS] Adds overview and API ref for voting configurations --- docs/reference/cluster.asciidoc | 2 + .../cluster/voting-exclusions.asciidoc | 57 ++++++++++ docs/reference/modules/discovery.asciidoc | 16 +-- .../discovery/adding-removing-nodes.asciidoc | 4 +- .../modules/discovery/voting.asciidoc | 100 ++++++++++++++++++ 5 files changed, 171 insertions(+), 8 deletions(-) create mode 100644 docs/reference/cluster/voting-exclusions.asciidoc create mode 100644 docs/reference/modules/discovery/voting.asciidoc diff --git a/docs/reference/cluster.asciidoc b/docs/reference/cluster.asciidoc index f92e364bae102..cfa2d5a6488d7 100644 --- a/docs/reference/cluster.asciidoc +++ b/docs/reference/cluster.asciidoc @@ -104,3 +104,5 @@ include::cluster/tasks.asciidoc[] include::cluster/nodes-hot-threads.asciidoc[] include::cluster/allocation-explain.asciidoc[] + +include::cluster/voting-exclusions.asciidoc[] diff --git a/docs/reference/cluster/voting-exclusions.asciidoc b/docs/reference/cluster/voting-exclusions.asciidoc new file mode 100644 index 0000000000000..dbb5432a28052 --- /dev/null +++ b/docs/reference/cluster/voting-exclusions.asciidoc @@ -0,0 +1,57 @@ +[[voting-config-exclusions]] +== Voting configuration exclusions API +++++ +Voting configuration exclusions +++++ + +Adds or removes nodes from the voting configuration exclusion list. + +[float] +=== Request + +[source,js] +-------------------------------------------------- +# Add a node to the voting configuration exclusions list +POST /_cluster/voting_config_exclusions/ + +# Remove all exclusions from the list +DELETE /_cluster/voting_config_exclusions +-------------------------------------------------- +// CONSOLE + +[float] +=== Path parameters + +`node_name`:: + A <> that identifies {es} nodes. + +[float] +=== Description + +If the <> +is `true`, the <> automatically +shrinks when you remove master-eligible nodes from the cluster. + +If the `cluster.auto_shrink_voting_configuration` setting is `false`, you must +use this API to remove departed nodes from the voting configuration manually. +It adds an entry for that node in the voting configuration exclusions list. The +cluster then tries to reconfigure the voting configuration to remove that node +and to prevent it from returning. + +If the API fails, you can safely retry it. Only a successful response +guarantees that the node has been removed from the voting configuration and will +not be reinstated. + +NOTE: Voting exclusions are required only when you remove at least half of the +master-eligible nodes from a cluster in a short time period. They are not +required when removing master-ineligible nodes or fewer than half of the +master-eligible nodes. + +The +<> +limits the size of the voting configuration exclusion list. The default value is +`10`. Since voting configuration exclusions are persistent and limited in number, +you must clean up the list. + +For more information, see <>. + diff --git a/docs/reference/modules/discovery.asciidoc b/docs/reference/modules/discovery.asciidoc index 78e8e82f7e84f..0886f4c338da0 100644 --- a/docs/reference/modules/discovery.asciidoc +++ b/docs/reference/modules/discovery.asciidoc @@ -13,6 +13,11 @@ module. This module is divided into the following sections: unknown, such as when a node has just started up or when the previous master has failed. +<>:: + + This section describes the detailed design behind the master election and + auto-reconfiguration logic. + <>:: Bootstrapping a cluster is required when an Elasticsearch cluster starts up @@ -39,11 +44,6 @@ module. This module is divided into the following sections: Cluster state publishing is the process by which the elected master node updates the cluster state on all the other nodes in the cluster. - -<>:: - - This section describes the detailed design behind the master election and - auto-reconfiguration logic. <>:: @@ -52,14 +52,16 @@ module. This module is divided into the following sections: include::discovery/discovery.asciidoc[] +include::discovery/quorums.asciidoc[] + +include::discovery/voting.asciidoc[] + include::discovery/bootstrapping.asciidoc[] include::discovery/adding-removing-nodes.asciidoc[] include::discovery/publishing.asciidoc[] -include::discovery/quorums.asciidoc[] - include::discovery/fault-detection.asciidoc[] include::discovery/discovery-settings.asciidoc[] \ No newline at end of file diff --git a/docs/reference/modules/discovery/adding-removing-nodes.asciidoc b/docs/reference/modules/discovery/adding-removing-nodes.asciidoc index a52cf1e2e7467..3b416ea51d223 100644 --- a/docs/reference/modules/discovery/adding-removing-nodes.asciidoc +++ b/docs/reference/modules/discovery/adding-removing-nodes.asciidoc @@ -12,6 +12,7 @@ cluster, and to scale the cluster up and down by adding and removing master-ineligible nodes only. However there are situations in which it may be desirable to add or remove some master-eligible nodes to or from a cluster. +[[modules-discovery-adding-nodes]] ==== Adding master-eligible nodes If you wish to add some nodes to your cluster, simply configure the new nodes @@ -24,6 +25,7 @@ cluster. You can use the `cluster.join.timeout` setting to configure how long a node waits after sending a request to join a cluster. Its default value is `30s`. See <>. +[[modules-discovery-removing-nodes]] ==== Removing master-eligible nodes When removing master-eligible nodes, it is important not to remove too many all @@ -50,7 +52,7 @@ will never automatically move a node on the voting exclusions list back into the voting configuration. Once an excluded node has been successfully auto-reconfigured out of the voting configuration, it is safe to shut it down without affecting the cluster's master-level availability. A node can be added -to the voting configuration exclusion list using the following API: +to the voting configuration exclusion list using the <> API. For example: [source,js] -------------------------------------------------- diff --git a/docs/reference/modules/discovery/voting.asciidoc b/docs/reference/modules/discovery/voting.asciidoc new file mode 100644 index 0000000000000..1f71cc4b8810f --- /dev/null +++ b/docs/reference/modules/discovery/voting.asciidoc @@ -0,0 +1,100 @@ +[[modules-discovery-voting]] +=== Voting configurations + +Each {es} cluster has a _voting configuration_, which is the set of +<> whose responses are counted when making +decisions such as electing a new master or committing a new cluster +state. Decisions are made only after a _quorum_ (more than half) of the nodes in +the voting configuration respond. + +Usually the voting configuration is the same as the set of all the +master-eligible nodes that are currently in the cluster. However, there are some +situations in which they may be different. + +IMPORTANT: To ensure the cluster remains available, you **must not stop half or +more of the nodes in the voting configuration at the same time**. As long as more +than half of the voting nodes are available, the cluster can work normally. For +example, if there are three or four master-eligible nodes, the cluster +can tolerate one unavailable node. If there are two or fewer master-eligible +nodes, they must all remain available. + +After a node joins or leaves the cluster, {es} reacts by automatically making +corresponding changes to the voting configuration in order to ensure that the +cluster is as resilient as possible. It is important to wait for this adjustment +to complete before you remove more nodes from the cluster. For more information, +see <>. + +The current voting configuration is stored in the cluster state so you can +inspect its current contents as follows: + +[source,js] +-------------------------------------------------- +GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config +-------------------------------------------------- +// CONSOLE + +NOTE: The current voting configuration is not necessarily the same as the set of +all available master-eligible nodes in the cluster. Altering the voting +configuration involves taking a vote, so it takes some time to adjust the +configuration as nodes join or leave the cluster. Also, there are situations +where the most resilient configuration includes unavailable nodes, or does not +include some available nodes, and in these situations the voting configuration +differs from the set of available master-eligible nodes in the cluster. + +Larger voting configurations are usually more resilient, so Elasticsearch +normally prefers to add master-eligible nodes to the voting configuration after +they join the cluster. Similarly, if a node in the voting configuration +leaves the cluster and there is another master-eligible node in the cluster that +is not in the voting configuration then it is preferable to swap these two nodes +over. The size of the voting configuration is thus unchanged but its +resilience increases. + +It is not so straightforward to automatically remove nodes from the voting +configuration after they have left the cluster. Different strategies have +different benefits and drawbacks, so the right choice depends on how the cluster +will be used. You can control whether the voting configuration automatically +shrinks by using the +<>. + +NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true`, the +recommended and default setting, and there are at least three master-eligible +nodes in the cluster, Elasticsearch remains capable of processing cluster state +updates as long as all but one of its master-eligible nodes are healthy. + +There are situations in which Elasticsearch might tolerate the loss of multiple +nodes, but this is not guaranteed under all sequences of failures. If the +`cluster.auto_shrink_voting_configuration` setting is `false`, you must remove +departed nodes from the voting configuration manually. Use the +<> to achieve the desired level +of resilience. + +No matter how it is configured, Elasticsearch will not suffer from a +"split-brain" inconsistency. The `cluster.auto_shrink_voting_configuration` +setting affects only its availability in the event of the failure of some of its +nodes, and the administrative tasks that must be performed as nodes join and +leave the cluster. + +[float] +==== Even numbers of master-eligible nodes + +There should normally be an odd number of master-eligible nodes in a cluster. +If there is an even number, Elasticsearch leaves one of them out of the voting +configuration to ensure that it has an odd size. This omission does not decrease +the failure-tolerance of the cluster. In fact, improves it slightly: if the +cluster suffers from a network partition that divides it into two equally-sized +halves then one of the halves will contain a majority of the voting +configuration and will be able to keep operating. If all of the votes from +master-eligible nodes were counted, neither side would contain a strict majority +of the nodes and so the cluster would not be able to make any progress. + +For instance if there are four master-eligible nodes in the cluster and the +voting configuration contained all of them, any quorum-based decision would +require votes from at least three of them. This situation means that the cluster +can tolerate the loss of only a single master-eligible node. If this cluster +were split into two equal halves, neither half would contain three +master-eligible nodes and the cluster would not be able to make any progress. +If the voting configuration contains only three of the four master-eligible +nodes, however, the cluster is still only fully tolerant to the loss of one +node, but quorum-based decisions require votes from two of the three voting +nodes. In the event of an even split, one half will contain two of the three +voting nodes so that half will remain available. From 8fc6112c183820fe410850dd2d56f98a5596f59d Mon Sep 17 00:00:00 2001 From: lcawl Date: Fri, 21 Dec 2018 16:16:57 -0800 Subject: [PATCH 2/7] [DOCS] Removes redundant sections --- .../modules/discovery/quorums.asciidoc | 90 ------------------- 1 file changed, 90 deletions(-) diff --git a/docs/reference/modules/discovery/quorums.asciidoc b/docs/reference/modules/discovery/quorums.asciidoc index 8f3b74be05d9d..40e31f06aa59f 100644 --- a/docs/reference/modules/discovery/quorums.asciidoc +++ b/docs/reference/modules/discovery/quorums.asciidoc @@ -103,93 +103,3 @@ and then started again then it will automatically recover, such as during a <>. There is no need to take any further action with the APIs described here in these cases, because the set of master nodes is not changing permanently. - -[float] -==== Automatic changes to the voting configuration - -Nodes may join or leave the cluster, and Elasticsearch reacts by automatically -making corresponding changes to the voting configuration in order to ensure that -the cluster is as resilient as possible. - -The default auto-reconfiguration -behaviour is expected to give the best results in most situations. The current -voting configuration is stored in the cluster state so you can inspect its -current contents as follows: - -[source,js] --------------------------------------------------- -GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config --------------------------------------------------- -// CONSOLE - -NOTE: The current voting configuration is not necessarily the same as the set of -all available master-eligible nodes in the cluster. Altering the voting -configuration involves taking a vote, so it takes some time to adjust the -configuration as nodes join or leave the cluster. Also, there are situations -where the most resilient configuration includes unavailable nodes, or does not -include some available nodes, and in these situations the voting configuration -differs from the set of available master-eligible nodes in the cluster. - -Larger voting configurations are usually more resilient, so Elasticsearch -normally prefers to add master-eligible nodes to the voting configuration after -they join the cluster. Similarly, if a node in the voting configuration -leaves the cluster and there is another master-eligible node in the cluster that -is not in the voting configuration then it is preferable to swap these two nodes -over. The size of the voting configuration is thus unchanged but its -resilience increases. - -It is not so straightforward to automatically remove nodes from the voting -configuration after they have left the cluster. Different strategies have -different benefits and drawbacks, so the right choice depends on how the cluster -will be used. You can control whether the voting configuration automatically shrinks by using the following setting: - -`cluster.auto_shrink_voting_configuration`:: - - Defaults to `true`, meaning that the voting configuration will automatically - shrink, shedding departed nodes, as long as it still contains at least 3 - nodes. If set to `false`, the voting configuration never automatically - shrinks; departed nodes must be removed manually using the - <>. - -NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true`, the -recommended and default setting, and there are at least three master-eligible -nodes in the cluster, then Elasticsearch remains capable of processing -cluster-state updates as long as all but one of its master-eligible nodes are -healthy. - -There are situations in which Elasticsearch might tolerate the loss of multiple -nodes, but this is not guaranteed under all sequences of failures. If this -setting is set to `false` then departed nodes must be removed from the voting -configuration manually, using the -<>, to achieve -the desired level of resilience. - -No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency. -The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the -event of the failure of some of its nodes, and the administrative tasks that -must be performed as nodes join and leave the cluster. - -[float] -==== Even numbers of master-eligible nodes - -There should normally be an odd number of master-eligible nodes in a cluster. -If there is an even number, Elasticsearch leaves one of them out of the voting -configuration to ensure that it has an odd size. This omission does not decrease -the failure-tolerance of the cluster. In fact, improves it slightly: if the -cluster suffers from a network partition that divides it into two equally-sized -halves then one of the halves will contain a majority of the voting -configuration and will be able to keep operating. If all of the master-eligible -nodes' votes were counted, neither side would contain a strict majority of the -nodes and so the cluster would not be able to make any progress. - -For instance if there are four master-eligible nodes in the cluster and the -voting configuration contained all of them, any quorum-based decision would -require votes from at least three of them. This situation means that the cluster -can tolerate the loss of only a single master-eligible node. If this cluster -were split into two equal halves, neither half would contain three -master-eligible nodes and the cluster would not be able to make any progress. -If the voting configuration contains only three of the four master-eligible -nodes, however, the cluster is still only fully tolerant to the loss of one -node, but quorum-based decisions require votes from two of the three voting -nodes. In the event of an even split, one half will contain two of the three -voting nodes so that half will remain available. From 419b48efa2bbe720f37104bbb01eb7ec174bd319 Mon Sep 17 00:00:00 2001 From: lcawl Date: Fri, 21 Dec 2018 16:22:39 -0800 Subject: [PATCH 3/7] [DOCS] Adds the shrink setting --- .../modules/discovery/discovery-settings.asciidoc | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/reference/modules/discovery/discovery-settings.asciidoc b/docs/reference/modules/discovery/discovery-settings.asciidoc index 381974b5498d8..dbfb38c98ad6f 100644 --- a/docs/reference/modules/discovery/discovery-settings.asciidoc +++ b/docs/reference/modules/discovery/discovery-settings.asciidoc @@ -3,6 +3,14 @@ Discovery and cluster formation are affected by the following settings: +`cluster.auto_shrink_voting_configuration`:: + + Controls whether the <> sheds + departed nodes automatically, as long as it still contains at least 3 nodes. + The default value is `true`. If set to `false`, the voting configuration + never shrinks automatically; you must remove departed nodes manually with + the <>. + [[master-election-settings]]`cluster.election.back_off_time`:: Sets the amount to increase the upper bound on the wait before an election From 5b608a9a2c00747b9003f775414257bad5df8303 Mon Sep 17 00:00:00 2001 From: David Turner Date: Mon, 31 Dec 2018 09:51:07 +0000 Subject: [PATCH 4/7] Some suggested changes - add more side-conditions on when the voting configuration shrinks - add TOC entries for voting (and omitted fault-detection) sections - add link to voting section from quorum section to define 'voting configuration' - move 'Setting the initial quorum' into voting config section - minor rewording/whitespace fixes --- .../cluster/voting-exclusions.asciidoc | 39 +++++++------ docs/reference/modules/discovery.asciidoc | 17 ++++-- .../discovery/discovery-settings.asciidoc | 17 +++--- .../discovery/fault-detection.asciidoc | 7 ++- .../modules/discovery/quorums.asciidoc | 55 +++---------------- .../modules/discovery/voting.asciidoc | 48 ++++++++++++++-- 6 files changed, 101 insertions(+), 82 deletions(-) diff --git a/docs/reference/cluster/voting-exclusions.asciidoc b/docs/reference/cluster/voting-exclusions.asciidoc index dbb5432a28052..4393821b4f6bd 100644 --- a/docs/reference/cluster/voting-exclusions.asciidoc +++ b/docs/reference/cluster/voting-exclusions.asciidoc @@ -1,10 +1,11 @@ [[voting-config-exclusions]] == Voting configuration exclusions API ++++ -Voting configuration exclusions +Voting Configuration Exclusions ++++ -Adds or removes nodes from the voting configuration exclusion list. +Adds or removes master-eligible nodes from the +<>. [float] === Request @@ -28,16 +29,20 @@ DELETE /_cluster/voting_config_exclusions [float] === Description -If the <> -is `true`, the <> automatically -shrinks when you remove master-eligible nodes from the cluster. - -If the `cluster.auto_shrink_voting_configuration` setting is `false`, you must -use this API to remove departed nodes from the voting configuration manually. -It adds an entry for that node in the voting configuration exclusions list. The -cluster then tries to reconfigure the voting configuration to remove that node -and to prevent it from returning. - +If the <> is `true`, and there are more than three master-eligible nodes in the +cluster, and you remove fewer than half of the master-eligible nodes in the +cluster at once, then the <> +automatically shrinks when you remove master-eligible nodes from the cluster. + +If the `cluster.auto_shrink_voting_configuration` setting is `false`, or you +wish to shrink the voting configuration to contain fewer than three nodes, or +you wish to remove half or more of the master-eligible nodes in the cluster at +once, you must use this API to remove departed nodes from the voting +configuration manually. It adds an entry for that node in the voting +configuration exclusions list. The cluster then tries to reconfigure the voting +configuration to remove that node and to prevent it from returning. + If the API fails, you can safely retry it. Only a successful response guarantees that the node has been removed from the voting configuration and will not be reinstated. @@ -47,11 +52,11 @@ master-eligible nodes from a cluster in a short time period. They are not required when removing master-ineligible nodes or fewer than half of the master-eligible nodes. -The -<> -limits the size of the voting configuration exclusion list. The default value is -`10`. Since voting configuration exclusions are persistent and limited in number, -you must clean up the list. +The <> limits the size of the voting configuration exclusion list. The +default value is `10`. Since voting configuration exclusions are persistent and +limited in number, you must clear the voting config exclusions list once the +exclusions are no longer required. For more information, see <>. diff --git a/docs/reference/modules/discovery.asciidoc b/docs/reference/modules/discovery.asciidoc index 0886f4c338da0..d3e0d4fe84751 100644 --- a/docs/reference/modules/discovery.asciidoc +++ b/docs/reference/modules/discovery.asciidoc @@ -15,8 +15,13 @@ module. This module is divided into the following sections: <>:: - This section describes the detailed design behind the master election and - auto-reconfiguration logic. + This section describes how {es} uses a quorum-based voting mechanism to + make decisions even if some nodes are unavailable. + +<>:: + + This section describes the concept of voting configurations, which {es} + automatically updates as nodes leave and join the cluster. <>:: @@ -44,7 +49,11 @@ module. This module is divided into the following sections: Cluster state publishing is the process by which the elected master node updates the cluster state on all the other nodes in the cluster. - + +<>:: + + {es} performs health checks to detect and remove faulty nodes. + <>:: There are settings that enable users to influence the discovery, cluster @@ -64,4 +73,4 @@ include::discovery/publishing.asciidoc[] include::discovery/fault-detection.asciidoc[] -include::discovery/discovery-settings.asciidoc[] \ No newline at end of file +include::discovery/discovery-settings.asciidoc[] diff --git a/docs/reference/modules/discovery/discovery-settings.asciidoc b/docs/reference/modules/discovery/discovery-settings.asciidoc index dbfb38c98ad6f..494c5ac225b87 100644 --- a/docs/reference/modules/discovery/discovery-settings.asciidoc +++ b/docs/reference/modules/discovery/discovery-settings.asciidoc @@ -5,11 +5,12 @@ Discovery and cluster formation are affected by the following settings: `cluster.auto_shrink_voting_configuration`:: - Controls whether the <> sheds - departed nodes automatically, as long as it still contains at least 3 nodes. - The default value is `true`. If set to `false`, the voting configuration - never shrinks automatically; you must remove departed nodes manually with - the <>. + Controls whether the <> + sheds departed nodes automatically, as long as it still contains at least 3 + nodes. The default value is `true`. If set to `false`, the voting + configuration never shrinks automatically and you must remove departed + nodes manually with the <>. [[master-election-settings]]`cluster.election.back_off_time`:: @@ -160,9 +161,11 @@ APIs are not be blocked and can run on any available node. Provides a list of master-eligible nodes in the cluster. The list contains either an array of hosts or a comma-delimited string. Each value has the - format `host:port` or `host`, where `port` defaults to the setting `transport.profiles.default.port`. Note that IPv6 hosts must be bracketed. + format `host:port` or `host`, where `port` defaults to the setting + `transport.profiles.default.port`. Note that IPv6 hosts must be bracketed. The default value is `127.0.0.1, [::1]`. See <>. `discovery.zen.ping.unicast.hosts.resolve_timeout`:: - Sets the amount of time to wait for DNS lookups on each round of discovery. This is specified as a <> and defaults to `5s`. \ No newline at end of file + Sets the amount of time to wait for DNS lookups on each round of discovery. + This is specified as a <> and defaults to `5s`. diff --git a/docs/reference/modules/discovery/fault-detection.asciidoc b/docs/reference/modules/discovery/fault-detection.asciidoc index b696cdb8f7ca2..9062444b80d6c 100644 --- a/docs/reference/modules/discovery/fault-detection.asciidoc +++ b/docs/reference/modules/discovery/fault-detection.asciidoc @@ -2,8 +2,9 @@ === Cluster fault detection The elected master periodically checks each of the nodes in the cluster to -ensure that they are still connected and healthy. Each node in the cluster also periodically checks the health of the elected master. These checks -are known respectively as _follower checks_ and _leader checks_. +ensure that they are still connected and healthy. Each node in the cluster also +periodically checks the health of the elected master. These checks are known +respectively as _follower checks_ and _leader checks_. Elasticsearch allows these checks to occasionally fail or timeout without taking any action. It considers a node to be faulty only after a number of @@ -16,4 +17,4 @@ and retry setting values and attempts to remove the node from the cluster. Similarly, if a node detects that the elected master has disconnected, this situation is treated as an immediate failure. The node bypasses the timeout and retry settings and restarts its discovery phase to try and find or elect a new -master. \ No newline at end of file +master. diff --git a/docs/reference/modules/discovery/quorums.asciidoc b/docs/reference/modules/discovery/quorums.asciidoc index 40e31f06aa59f..1a1954454268c 100644 --- a/docs/reference/modules/discovery/quorums.asciidoc +++ b/docs/reference/modules/discovery/quorums.asciidoc @@ -18,13 +18,13 @@ cluster. In many cases you can do this simply by starting or stopping the nodes as required. See <>. As nodes are added or removed Elasticsearch maintains an optimal level of fault -tolerance by updating the cluster's _voting configuration_, which is the set of -master-eligible nodes whose responses are counted when making decisions such as -electing a new master or committing a new cluster state. A decision is made only -after more than half of the nodes in the voting configuration have responded. -Usually the voting configuration is the same as the set of all the -master-eligible nodes that are currently in the cluster. However, there are some -situations in which they may be different. +tolerance by updating the cluster's <>, which is the set of master-eligible nodes whose responses are +counted when making decisions such as electing a new master or committing a new +cluster state. A decision is made only after more than half of the nodes in the +voting configuration have responded. Usually the voting configuration is the +same as the set of all the master-eligible nodes that are currently in the +cluster. However, there are some situations in which they may be different. To be sure that the cluster remains available you **must not stop half or more of the nodes in the voting configuration at the same time**. As long as more @@ -38,46 +38,6 @@ cluster-state update that adjusts the voting configuration to match, and this can take a short time to complete. It is important to wait for this adjustment to complete before removing more nodes from the cluster. -[float] -==== Setting the initial quorum - -When a brand-new cluster starts up for the first time, it must elect its first -master node. To do this election, it needs to know the set of master-eligible -nodes whose votes should count. This initial voting configuration is known as -the _bootstrap configuration_ and is set in the -<>. - -It is important that the bootstrap configuration identifies exactly which nodes -should vote in the first election. It is not sufficient to configure each node -with an expectation of how many nodes there should be in the cluster. It is also -important to note that the bootstrap configuration must come from outside the -cluster: there is no safe way for the cluster to determine the bootstrap -configuration correctly on its own. - -If the bootstrap configuration is not set correctly, when you start a brand-new -cluster there is a risk that you will accidentally form two separate clusters -instead of one. This situation can lead to data loss: you might start using both -clusters before you notice that anything has gone wrong and it is impossible to -merge them together later. - -NOTE: To illustrate the problem with configuring each node to expect a certain -cluster size, imagine starting up a three-node cluster in which each node knows -that it is going to be part of a three-node cluster. A majority of three nodes -is two, so normally the first two nodes to discover each other form a cluster -and the third node joins them a short time later. However, imagine that four -nodes were erroneously started instead of three. In this case, there are enough -nodes to form two separate clusters. Of course if each node is started manually -then it's unlikely that too many nodes are started. If you're using an automated -orchestrator, however, it's certainly possible to get into this situation-- -particularly if the orchestrator is not resilient to failures such as network -partitions. - -The initial quorum is only required the very first time a whole cluster starts -up. New nodes joining an established cluster can safely obtain all the -information they need from the elected master. Nodes that have previously been -part of a cluster will have stored to disk all the information that is required -when they restart. - [float] ==== Master elections @@ -103,3 +63,4 @@ and then started again then it will automatically recover, such as during a <>. There is no need to take any further action with the APIs described here in these cases, because the set of master nodes is not changing permanently. + diff --git a/docs/reference/modules/discovery/voting.asciidoc b/docs/reference/modules/discovery/voting.asciidoc index 1f71cc4b8810f..84aab16b8ed30 100644 --- a/docs/reference/modules/discovery/voting.asciidoc +++ b/docs/reference/modules/discovery/voting.asciidoc @@ -1,11 +1,11 @@ [[modules-discovery-voting]] === Voting configurations -Each {es} cluster has a _voting configuration_, which is the set of +Each {es} cluster has a _voting configuration_, which is the set of <> whose responses are counted when making -decisions such as electing a new master or committing a new cluster -state. Decisions are made only after a _quorum_ (more than half) of the nodes in -the voting configuration respond. +decisions such as electing a new master or committing a new cluster state. +Decisions are made only after a majority (more than half) of the nodes in the +voting configuration respond. Usually the voting configuration is the same as the set of all the master-eligible nodes that are currently in the cluster. However, there are some @@ -98,3 +98,43 @@ nodes, however, the cluster is still only fully tolerant to the loss of one node, but quorum-based decisions require votes from two of the three voting nodes. In the event of an even split, one half will contain two of the three voting nodes so that half will remain available. + +[float] +==== Setting the initial voting configuration + +When a brand-new cluster starts up for the first time, it must elect its first +master node. To do this election, it needs to know the set of master-eligible +nodes whose votes should count. This initial voting configuration is known as +the _bootstrap configuration_ and is set in the +<>. + +It is important that the bootstrap configuration identifies exactly which nodes +should vote in the first election. It is not sufficient to configure each node +with an expectation of how many nodes there should be in the cluster. It is also +important to note that the bootstrap configuration must come from outside the +cluster: there is no safe way for the cluster to determine the bootstrap +configuration correctly on its own. + +If the bootstrap configuration is not set correctly, when you start a brand-new +cluster there is a risk that you will accidentally form two separate clusters +instead of one. This situation can lead to data loss: you might start using both +clusters before you notice that anything has gone wrong and it is impossible to +merge them together later. + +NOTE: To illustrate the problem with configuring each node to expect a certain +cluster size, imagine starting up a three-node cluster in which each node knows +that it is going to be part of a three-node cluster. A majority of three nodes +is two, so normally the first two nodes to discover each other form a cluster +and the third node joins them a short time later. However, imagine that four +nodes were erroneously started instead of three. In this case, there are enough +nodes to form two separate clusters. Of course if each node is started manually +then it's unlikely that too many nodes are started. If you're using an automated +orchestrator, however, it's certainly possible to get into this situation-- +particularly if the orchestrator is not resilient to failures such as network +partitions. + +The initial quorum is only required the very first time a whole cluster starts +up. New nodes joining an established cluster can safely obtain all the +information they need from the elected master. Nodes that have previously been +part of a cluster will have stored to disk all the information that is required +when they restart. From 6f8d374cd761cb266e450b54e28d91709d048470 Mon Sep 17 00:00:00 2001 From: lcawl Date: Thu, 3 Jan 2019 10:15:12 -0800 Subject: [PATCH 5/7] [DOCS] Downplays cluster.auto_shrink_voting_configuration --- .../cluster/voting-exclusions.asciidoc | 30 ++++++++++--------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/docs/reference/cluster/voting-exclusions.asciidoc b/docs/reference/cluster/voting-exclusions.asciidoc index 4393821b4f6bd..0b298bbe1cac6 100644 --- a/docs/reference/cluster/voting-exclusions.asciidoc +++ b/docs/reference/cluster/voting-exclusions.asciidoc @@ -29,19 +29,17 @@ DELETE /_cluster/voting_config_exclusions [float] === Description -If the <> is `true`, and there are more than three master-eligible nodes in the -cluster, and you remove fewer than half of the master-eligible nodes in the -cluster at once, then the <> -automatically shrinks when you remove master-eligible nodes from the cluster. - -If the `cluster.auto_shrink_voting_configuration` setting is `false`, or you -wish to shrink the voting configuration to contain fewer than three nodes, or -you wish to remove half or more of the master-eligible nodes in the cluster at -once, you must use this API to remove departed nodes from the voting -configuration manually. It adds an entry for that node in the voting -configuration exclusions list. The cluster then tries to reconfigure the voting -configuration to remove that node and to prevent it from returning. +By default, if there are more than three master-eligible nodes in the cluster +and you remove fewer than half of the master-eligible nodes in the cluster at +once, the <> automatically +shrinks when you remove master-eligible nodes from the cluster. + +If you want to shrink the voting configuration to contain fewer than three nodes +or to remove half or more of the master-eligible nodes in the cluster at once, +you must use this API to remove departed nodes from the voting configuration +manually. It adds an entry for that node in the voting configuration exclusions +list. The cluster then tries to reconfigure the voting configuration to remove +that node and to prevent it from returning. If the API fails, you can safely retry it. Only a successful response guarantees that the node has been removed from the voting configuration and will @@ -58,5 +56,9 @@ default value is `10`. Since voting configuration exclusions are persistent and limited in number, you must clear the voting config exclusions list once the exclusions are no longer required. -For more information, see <>. +There is also a +<>, +which is set to true by default. If it is set to false, you must use this API to +maintain the voting configuration. +For more information, see <>. From 77130e854c3ab2b70541fc00d3d88535801ca82f Mon Sep 17 00:00:00 2001 From: lcawl Date: Thu, 3 Jan 2019 10:37:47 -0800 Subject: [PATCH 6/7] [DOCS] More small edits --- docs/reference/cluster/voting-exclusions.asciidoc | 2 +- docs/reference/modules/discovery/voting.asciidoc | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/reference/cluster/voting-exclusions.asciidoc b/docs/reference/cluster/voting-exclusions.asciidoc index 0b298bbe1cac6..f8b095000e5ef 100644 --- a/docs/reference/cluster/voting-exclusions.asciidoc +++ b/docs/reference/cluster/voting-exclusions.asciidoc @@ -32,7 +32,7 @@ DELETE /_cluster/voting_config_exclusions By default, if there are more than three master-eligible nodes in the cluster and you remove fewer than half of the master-eligible nodes in the cluster at once, the <> automatically -shrinks when you remove master-eligible nodes from the cluster. +shrinks. If you want to shrink the voting configuration to contain fewer than three nodes or to remove half or more of the master-eligible nodes in the cluster at once, diff --git a/docs/reference/modules/discovery/voting.asciidoc b/docs/reference/modules/discovery/voting.asciidoc index 84aab16b8ed30..7c6ea0c1cc985 100644 --- a/docs/reference/modules/discovery/voting.asciidoc +++ b/docs/reference/modules/discovery/voting.asciidoc @@ -37,8 +37,8 @@ NOTE: The current voting configuration is not necessarily the same as the set of all available master-eligible nodes in the cluster. Altering the voting configuration involves taking a vote, so it takes some time to adjust the configuration as nodes join or leave the cluster. Also, there are situations -where the most resilient configuration includes unavailable nodes, or does not -include some available nodes, and in these situations the voting configuration +where the most resilient configuration includes unavailable nodes or does not +include some available nodes. In these situations, the voting configuration differs from the set of available master-eligible nodes in the cluster. Larger voting configurations are usually more resilient, so Elasticsearch @@ -56,8 +56,8 @@ will be used. You can control whether the voting configuration automatically shrinks by using the <>. -NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true`, the -recommended and default setting, and there are at least three master-eligible +NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true` (which is +the default and recommended value) and there are at least three master-eligible nodes in the cluster, Elasticsearch remains capable of processing cluster state updates as long as all but one of its master-eligible nodes are healthy. @@ -71,7 +71,7 @@ of resilience. No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency. The `cluster.auto_shrink_voting_configuration` setting affects only its availability in the event of the failure of some of its -nodes, and the administrative tasks that must be performed as nodes join and +nodes and the administrative tasks that must be performed as nodes join and leave the cluster. [float] From 1da744ffa42e4162efe156742015bc097cbac691 Mon Sep 17 00:00:00 2001 From: lcawl Date: Thu, 3 Jan 2019 13:53:34 -0800 Subject: [PATCH 7/7] [DOCS] Fixes code snippet failure --- .../cluster/voting-exclusions.asciidoc | 28 +++++++++++++------ 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/docs/reference/cluster/voting-exclusions.asciidoc b/docs/reference/cluster/voting-exclusions.asciidoc index f8b095000e5ef..fcef8113912c4 100644 --- a/docs/reference/cluster/voting-exclusions.asciidoc +++ b/docs/reference/cluster/voting-exclusions.asciidoc @@ -10,15 +10,9 @@ Adds or removes master-eligible nodes from the [float] === Request -[source,js] --------------------------------------------------- -# Add a node to the voting configuration exclusions list -POST /_cluster/voting_config_exclusions/ +`POST _cluster/voting_config_exclusions/` + -# Remove all exclusions from the list -DELETE /_cluster/voting_config_exclusions --------------------------------------------------- -// CONSOLE +`DELETE _cluster/voting_config_exclusions` [float] === Path parameters @@ -62,3 +56,21 @@ which is set to true by default. If it is set to false, you must use this API to maintain the voting configuration. For more information, see <>. + +[float] +=== Examples + +Add `nodeId1` to the voting configuration exclusions list: +[source,js] +-------------------------------------------------- +POST /_cluster/voting_config_exclusions/nodeId1 +-------------------------------------------------- +// CONSOLE +// TEST[catch:bad_request] + +Remove all exclusions from the list: +[source,js] +-------------------------------------------------- +DELETE /_cluster/voting_config_exclusions +-------------------------------------------------- +// CONSOLE \ No newline at end of file