diff --git a/docs/reference/commands/index.asciidoc b/docs/reference/commands/index.asciidoc index 8f4d178a99296..a13ea58c27d3e 100644 --- a/docs/reference/commands/index.asciidoc +++ b/docs/reference/commands/index.asciidoc @@ -10,6 +10,7 @@ tasks from the command line: * <> * <> * <> +* <> * <> * <> * <> @@ -21,6 +22,7 @@ tasks from the command line: include::certgen.asciidoc[] include::certutil.asciidoc[] include::migrate-tool.asciidoc[] +include::node-tool.asciidoc[] include::saml-metadata.asciidoc[] include::setup-passwords.asciidoc[] include::shard-tool.asciidoc[] diff --git a/docs/reference/commands/node-tool.asciidoc b/docs/reference/commands/node-tool.asciidoc new file mode 100644 index 0000000000000..d7ed054f47599 --- /dev/null +++ b/docs/reference/commands/node-tool.asciidoc @@ -0,0 +1,334 @@ +[[node-tool]] +== elasticsearch-node + +The `elasticsearch-node` command enables you to perform unsafe operations that +risk data loss but which may help to recover some data in a disaster. + +[float] +=== Synopsis + +[source,shell] +-------------------------------------------------- +bin/elasticsearch-node unsafe-bootstrap|detach-cluster + [--ordinal ] [-E ] + [-h, --help] ([-s, --silent] | [-v, --verbose]) +-------------------------------------------------- + +[float] +=== Description + +Sometimes {es} nodes are temporarily stopped, perhaps because of the need to +perform some maintenance activity or perhaps because of a hardware failure. +After you resolve the temporary condition and restart the node, +it will rejoin the cluster and continue normally. Depending on your +configuration, your cluster may be able to remain completely available even +while one or more of its nodes are stopped. + +Sometimes it might not be possible to restart a node after it has stopped. For +example, the node's host may suffer from a hardware problem that cannot be +repaired. If the cluster is still available then you can start up a fresh node +on another host and {es} will bring this node into the cluster in place of the +failed node. + +Each node stores its data in the data directories defined by the +<>. This means that in a disaster you can +also restart a node by moving its data directories to another host, presuming +that those data directories can be recovered from the faulty host. + +{es} <> in order to elect a master and to update the cluster +state. This means that if you have three master-eligible nodes then the cluster +will remain available even if one of them has failed. However if two of the +three master-eligible nodes fail then the cluster will be unavailable until at +least one of them is restarted. + +In very rare circumstances it may not be possible to restart enough nodes to +restore the cluster's availability. If such a disaster occurs, you should +build a new cluster from a recent snapshot and re-import any data that was +ingested since that snapshot was taken. + +However, if the disaster is serious enough then it may not be possible to +recover from a recent snapshot either. Unfortunately in this case there is no +way forward that does not risk data loss, but it may be possible to use the +`elasticsearch-node` tool to construct a new cluster that contains some of the +data from the failed cluster. + +This tool has two modes: + +* `elastisearch-node unsafe-bootstap` can be used if there is at least one + remaining master-eligible node. It forces one of the remaining nodes to form + a brand-new cluster on its own, using its local copy of the cluster metadata. + This is known as _unsafe cluster bootstrapping_. + +* `elastisearch-node detach-cluster` enables you to move nodes from one cluster + to another. This can be used to move nodes into the new cluster created with + the `elastisearch-node unsafe-bootstap` command. If unsafe cluster bootstrapping was not + possible, it also enables you to + move nodes into a brand-new cluster. + +[[node-tool-unsafe-bootstrap]] +[float] +==== Unsafe cluster bootstrapping + +If there is at least one remaining master-eligible node, but it is not possible +to restart a majority of them, then the `elasticsearch-node unsafe-bootstrap` +command will unsafely override the cluster's <> as if performing another +<>. +The target node can then form a new cluster on its own by using +the cluster metadata held locally on the target node. + +[WARNING] +These steps can lead to arbitrary data loss since the target node may not hold the latest cluster +metadata, and this out-of-date metadata may make it impossible to use some or +all of the indices in the cluster. + +Since unsafe bootstrapping forms a new cluster containing a single node, once +you have run it you must use the <> to migrate any other surviving nodes from the failed +cluster into this new cluster. + +When you run the `elasticsearch-node unsafe-bootstrap` tool it will analyse the +state of the node and ask for confirmation before taking any action. Before +asking for confirmation it reports the term and version of the cluster state on +the node on which it runs as follows: + +[source,txt] +---- +Current node cluster state (term, version) pair is (4, 12) +---- + +If you have a choice of nodes on which to run this tool then you should choose +one with a term that is as large as possible. If there is more than one +node with the same term, pick the one with the largest version. +This information identifies the node with the freshest cluster state, which minimizes the +quantity of data that might be lost. For example, if the first node reports +`(4, 12)` and a second node reports `(5, 3)`, then the second node is preferred +since its term is larger. However if the second node reports `(3, 17)` then +the first node is preferred since its term is larger. If the second node +reports `(4, 10)` then it has the same term as the first node, but has a +smaller version, so the first node is preferred. + +[WARNING] +Running this command can lead to arbitrary data loss. Only run this tool if you +understand and accept the possible consequences and have exhausted all other +possibilities for recovery of your cluster. + +The sequence of operations for using this tool are as follows: + +1. Make sure you have really lost access to at least half of the +master-eligible nodes in the cluster, and they cannot be repaired or recovered +by moving their data paths to healthy hardware. +2. Stop **all** remaining nodes. +3. Choose one of the remaining master-eligible nodes to become the new elected +master as described above. +4. On this node, run the `elasticsearch-node unsafe-bootstrap` command as shown +below. Verify that the tool reported `Master node was successfully +bootstrapped`. +5. Start this node and verify that it is elected as the master node. +6. Run the <>, described below, on every other node in the cluster. +7. Start all other nodes and verify that each one joins the cluster. +8. Investigate the data in the cluster to discover if any was lost during this +process. + +When you run the tool it will make sure that the node that is being used to +bootstrap the cluster is not running. It is important that all other +master-eligible nodes are also stopped while this tool is running, but the tool +does not check this. + +The message `Master node was successfully bootstrapped` does not mean that +there has been no data loss, it just means that tool was able to complete its +job. + +[[node-tool-detach-cluster]] +[float] +==== Detaching nodes from their cluster + +It is unsafe for nodes to move between clusters, because different clusters +have completely different cluster metadata. There is no way to safely merge the +metadata from two clusters together. + +To protect against inadvertently joining the wrong cluster, each cluster +creates a unique identifier, known as the _cluster UUID_, when it first starts +up. Every node records the UUID of its cluster and refuses to join a +cluster with a different UUID. + +However, if a node's cluster has permanently failed then it may be desirable to +try and move it into a new cluster. The `elasticsearch-node detach-cluster` +command lets you detach a node from its cluster by resetting its cluster UUID. +It can then join another cluster with a different UUID. + +For example, after unsafe cluster bootstrapping you will need to detach all the +other surviving nodes from their old cluster so they can join the new, +unsafely-bootstrapped cluster. + +Unsafe cluster bootstrapping is only possible if there is at least one +surviving master-eligible node. If there are no remaining master-eligible nodes +then the cluster metadata is completely lost. However, the individual data +nodes also contain a copy of the index metadata corresponding with their +shards. This sometimes allows a new cluster to import these shards as +<>. You can sometimes +recover some indices after the loss of all master-eligible nodes in a cluster +by creating a new cluster and then using the `elasticsearch-node +detach-cluster` command to move any surviving nodes into this new cluster. + +There is a risk of data loss when importing a dangling index because data nodes +may not have the most recent copy of the index metadata and do not have any +information about <>. This +means that a stale shard copy may be selected to be the primary, and some of +the shards may be incompatible with the imported mapping. + +[WARNING] +Execution of this command can lead to arbitrary data loss. Only run this tool +if you understand and accept the possible consequences and have exhausted all +other possibilities for recovery of your cluster. + +The sequence of operations for using this tool are as follows: + +1. Make sure you have really lost access to every one of the master-eligible +nodes in the cluster, and they cannot be repaired or recovered by moving their +data paths to healthy hardware. +2. Start a new cluster and verify that it is healthy. This cluster may comprise +one or more brand-new master-eligible nodes, or may be an unsafely-bootstrapped +cluster formed as described above. +3. Stop **all** remaining data nodes. +4. On each data node, run the `elasticsearch-node detach-cluster` tool as shown +below. Verify that the tool reported `Node was successfully detached from the +cluster`. +5. If necessary, configure each data node to +<>. +6. Start each data node and verify that it has joined the new cluster. +7. Wait for all recoveries to have completed, and investigate the data in the +cluster to discover if any was lost during this process. + +The message `Node was successfully detached from the cluster` does not mean +that there has been no data loss, it just means that tool was able to complete +its job. + +[float] +=== Parameters + +`unsafe-bootstrap`:: Specifies to unsafely bootstrap this node as a new +one-node cluster. + +`detach-cluster`:: Specifies to unsafely detach this node from its cluster so +it can join a different cluster. + +`--ordinal `:: If there is <> then this specifies which node to target. Defaults +to `0`, meaning to use the first node in the data path. + +`-E `:: Configures a setting. + +`-h, --help`:: Returns all of the command parameters. + +`-s, --silent`:: Shows minimal output. + +`-v, --verbose`:: Shows verbose output. + +[float] +=== Examples + +[float] +==== Unsafe cluster bootstrapping + +Suppose your cluster had five master-eligible nodes and you have permanently +lost three of them, leaving two nodes remaining. + +* Run the tool on the first remaining node, but answer `n` at the confirmation + step. + +[source,txt] +---- +node_1$ ./bin/elasticsearch-node unsafe-bootstrap + + WARNING: Elasticsearch MUST be stopped before running this tool. + +Current node cluster state (term, version) pair is (4, 12) + +You should only run this tool if you have permanently lost half or more +of the master-eligible nodes in this cluster, and you cannot restore the +cluster from a snapshot. This tool can cause arbitrary data loss and its +use should be your last resort. If you have multiple surviving master +eligible nodes, you should run this tool on the node with the highest +cluster state (term, version) pair. + +Do you want to proceed? + +Confirm [y/N] n +---- + +* Run the tool on the second remaining node, and again answer `n` at the + confirmation step. + +[source,txt] +---- +node_2$ ./bin/elasticsearch-node unsafe-bootstrap + + WARNING: Elasticsearch MUST be stopped before running this tool. + +Current node cluster state (term, version) pair is (5, 3) + +You should only run this tool if you have permanently lost half or more +of the master-eligible nodes in this cluster, and you cannot restore the +cluster from a snapshot. This tool can cause arbitrary data loss and its +use should be your last resort. If you have multiple surviving master +eligible nodes, you should run this tool on the node with the highest +cluster state (term, version) pair. + +Do you want to proceed? + +Confirm [y/N] n +---- + +* Since the second node has a greater term it has a fresher cluster state, so + it is better to unsafely bootstrap the cluster using this node: + +[source,txt] +---- +node_2$ ./bin/elasticsearch-node unsafe-bootstrap + + WARNING: Elasticsearch MUST be stopped before running this tool. + +Current node cluster state (term, version) pair is (5, 3) + +You should only run this tool if you have permanently lost half or more +of the master-eligible nodes in this cluster, and you cannot restore the +cluster from a snapshot. This tool can cause arbitrary data loss and its +use should be your last resort. If you have multiple surviving master +eligible nodes, you should run this tool on the node with the highest +cluster state (term, version) pair. + +Do you want to proceed? + +Confirm [y/N] y +Master node was successfully bootstrapped +---- + +[float] +==== Detaching nodes from their cluster + +After unsafely bootstrapping a new cluster, run the `elasticsearch-node +detach-cluster` command to detach all remaining nodes from the failed cluster +so they can join the new cluster: + +[source, txt] +---- +node_3$ ./bin/elasticsearch-node detach-cluster + + WARNING: Elasticsearch MUST be stopped before running this tool. + +You should only run this tool if you have permanently lost all of the +master-eligible nodes in this cluster and you cannot restore the cluster +from a snapshot, or you have already unsafely bootstrapped a new cluster +by running `elasticsearch-node unsafe-bootstrap` on a master-eligible +node that belonged to the same cluster as this node. This tool can cause +arbitrary data loss and its use should be your last resort. + +Do you want to proceed? + +Confirm [y/N] y +Node was successfully detached from the cluster +---- + diff --git a/docs/reference/modules/gateway.asciidoc b/docs/reference/modules/gateway.asciidoc index 038a4b24a853f..2b0783c9de0b0 100644 --- a/docs/reference/modules/gateway.asciidoc +++ b/docs/reference/modules/gateway.asciidoc @@ -49,6 +49,7 @@ as long as the following conditions are met: NOTE: These settings only take effect on a full cluster restart. +[[modules-gateway-dangling-indices]] === Dangling indices When a node joins the cluster, any shards stored in its local data