elastic · andrershov · Mar 12, 2019 · Jan 24, 2019 · Jan 24, 2019 · Jan 24, 2019
diff --git a/docs/reference/commands/index.asciidoc b/docs/reference/commands/index.asciidoc
@@ -13,6 +13,7 @@ tasks from the command line:
 * <<saml-metadata>>
 * <<setup-passwords>>
 * <<shard-tool>>
+* <<node-tool>>
 * <<syskeygen>>
 * <<users-command>>
 
@@ -24,5 +25,6 @@ include::migrate-tool.asciidoc[]
 include::saml-metadata.asciidoc[]
 include::setup-passwords.asciidoc[]
 include::shard-tool.asciidoc[]
+include::node-tool.asciidoc[]
 include::syskeygen.asciidoc[]
 include::users-command.asciidoc[]
diff --git a/docs/reference/commands/node-tool.asciidoc b/docs/reference/commands/node-tool.asciidoc
@@ -0,0 +1,197 @@
+[[node-tool]]
+== elasticsearch-node
+
+Sometimes {es} nodes are temporarily stopped, perhaps because of the need to
+perform some maintenance activity or perhaps because of a hardware failure.
+Once the temporary condition has been resolved you should restart the node and
+it will rejoin the cluster and continue normally. Depending on your
+configuration, your cluster may be able to remain completely available even
+while one or more of its nodes are stopped.
+
+Sometimes it might not be possible to restart a node after it has stopped. For
+example, the node's host may suffer from a hardware problem that cannot be
+repaired. If the cluster is still available then you can start up
+a fresh node on another host and {es} will bring this node into the cluster in place
+of the failed node.
+
+Each node stores its data in the data directories defined by the
+<<path-settings,`path.data` setting>>. This means that in a disaster you can
+also restart a node by moving its data directories to another host, presuming
+that those data directories can be recovered from the faulty host. Note that it
+is not possible to restore the data directory from a backup because this will
+lead to data corruption. Backups of an {es} cluster can only be taken using
+<<modules-snapshots>>.
+
+{es} <<modules-discovery-quorums,requires a response from a majority of the
+master-eligible nodes>> in order to elect a master and to update the cluster
+state. This means that if you have three master-eligible nodes then the cluster
+will remain available even if one of them has failed. However if two of the
+three master-eligible nodes fail then the cluster will be unavailable until at
+least one of them is restarted.
+
+In very rare circumstances it may not be possible to restart enough nodes to
+restore the cluster's availability. If such a disaster occurs then you should
+build a new cluster from a recent snapshot, and re-import any data that was
+ingested since that snapshot was taken.
+
+However, if the disaster is serious enough then it may not be possible to
+recover from a recent snapshot either. Unfortunately in this case there is no
+way forward that does not risk data loss, but it may be possible to use the
+`elasticsearch-node` tool to unsafely bring the cluster back online.
+
+This tool has two modes, depending on whether there are any master-eligible
+nodes remaining or not:
+
+* `elastisearch-node unsafe-bootstap` can be used if there is at least one
+  remaining master-eligible node. It allows you to force one of the remaining
+  nodes to become the elected master on its own.
+
+* `elastisearch-node detach-cluster` can be used if there are no remaining
+  master-eligible nodes. It allows you to detach any remaining data nodes from
+  the old, failed, cluster so they can join a new cluster.
+
+[float]
+=== Unsafe cluster bootstrapping
+
+If there is at least one remaining master-eligible node, but it is not possible
+to restart a majority of them, then the `elasticsearch-node unsafe-bootstrap`
+command will unsafely override the cluster's <<modules-discovery-voting,voting
+configuration>> as if performing another
+<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>, allowing
+the target node to become the elected master without needing a response from
+any other nodes. This can lead to arbitrary data loss since the chosen node may
+not hold the latest cluster metadata, and this out-of-date metadata may make it
+impossible to use some or all of the indices in the cluster.
+
+When you run the `elasticsearch-node unsafe-bootstrap` tool it will analyse the
+state of the node and ask for confirmation before taking any action. Before
+asking for confirmation it reports the term and version of the cluster state on
+the node on which it runs as follows:
+
+[source,txt]
+----
+Current node cluster state (term, version) pair is (4, 12)
+----
+
+If you have a choice of nodes on which to run this tool then you should pick
+one with a term that is as large as possible, and if there are multiple nodes
+with the same term then you should pick the one with the largest version. This
+identifies the node with the freshest cluster state, minimising the quantity of
+data that might be lost. For example, if the first node reports `(4, 12)` and a
+second node reports `(5, 3)`, then the second node is preferred since its term
+is larger.  However if the second node reports `(3, 17)` then the first node is
+preferred since its term is larger. If the second node reports `(4, 10)` then
+it has the same term as the first node, but has a smaller version, so the first
+node is preferred.
+
+[WARNING]
+Execution of this command can lead to arbitrary data loss. Only run this tool
+if you understand and accept the possible consequences and have exhausted all
+other possibilities for recovery of your cluster.
+
+The sequence of operations for using this tool are as follows:
+
+1. Make sure you have really lost access to at least half of the
+master-eligible nodes in the cluster, and they cannot be repaired or recovered
+by moving their data paths to healthy hardware.
+2. Stop **all** remaining master-eligible nodes.
+3. Select one of the remaining master-eligible nodes to become the new elected
+master as described above.
+4. On this node, run the `elasticsearch-node unsafe-bootstrap` command as shown
+below. Verify that the tool reported `Master node was successfully
+bootstrapped`.
+5. Start this node and verify that it is elected as the master node.
+6. Start all other master-eligible nodes and verify that each one joins the
+cluster.
+7. Any running master-ineligible nodes will automatically join the
+newly-elected master. Restart any previously-stopped nodes and verify that the
+cluster is now fully-formed.
+8. Investigate the data in the cluster to discover if any was lost during this
+process.
+
+[WARNING]
+When you run the tool it will make sure that the node that is being used to
+bootstrap the cluster is not running. It is important that all other
+master-eligible nodes are also stopped while this tool is running, but the tool
+does not check this.
+
+[NOTE]
+The message `Master node was successfully bootstrapped` does not mean that
+there has been no data loss, it just means that tool was able to complete its
+job.
+
+As an example, suppose your cluster had five master-eligible nodes and you have
+permanently lost three of them, leaving two nodes remaining.
+
+* Run the tool on the first remaining node, but answer `n` at the confirmation
+  step.
+
+[source,txt]
+----
+node_1$ ./bin/elasticsearch-node unsafe-bootstrap
+
+    WARNING: Elasticsearch MUST be stopped before running this tool.
+
+Current node cluster state (term, version) pair is (4, 12)
+
+You should run this tool only if you have permanently lost half
+or more of the master-eligible nodes, and you cannot restore the cluster
+from a snapshot. This tool can result in arbitrary data loss and
+should be the last resort.
+If you have multiple survived master eligible nodes, consider running
+this tool on the node with the highest cluster state (term, version) pair.
+Do you want to proceed?
+
+Confirm [y/N] n
+----
+
+* Run the tool on the second remaining node, and again answer `n` at the
+  confirmation step.
+
+[source,txt]
+----
+node_2$ ./bin/elasticsearch-node unsafe-bootstrap
+
+    WARNING: Elasticsearch MUST be stopped before running this tool.
+
+Current node cluster state (term, version) pair is (5, 3)
+
+You should run this tool only if you have permanently lost half
+or more of the master-eligible nodes, and you cannot restore the cluster
+from a snapshot. This tool can result in arbitrary data loss and
+should be the last resort.
+If you have multiple survived master eligible nodes, consider running
+this tool on the node with the highest cluster state (term, version) pair.
+Do you want to proceed?
+
+Confirm [y/N] n
+----
+
+* Since the second node has a greater term it has a fresher cluster state, so
+  it is better to unsafely bootstrap the cluster using this node:
+
+[source,txt]
+----
+node_2$ ./bin/elasticsearch-node unsafe-bootstrap
+
+    WARNING: Elasticsearch MUST be stopped before running this tool.
+
+Current node cluster state (term, version) pair is (5, 3)
+
+You should run this tool only if you have permanently lost half
+or more of the master-eligible nodes, and you cannot restore the cluster
+from a snapshot. This tool can result in arbitrary data loss and
+should be the last resort.
+If you have multiple survived master eligible nodes, consider running
+this tool on the node with the highest cluster state (term, version) pair.
+Do you want to proceed?
+
+Confirm [y/N] y
+Master node was successfully bootstrapped
+----
+
+[float]
+=== Detach cluster
+To be described
+
+