|
| 1 | +[[node-tool]] |
| 2 | +== elasticsearch-node |
| 3 | + |
| 4 | +The `elasticsearch-node` command enables you to perform unsafe operations that |
| 5 | +risk data loss but which may help to recover some data in a disaster. |
| 6 | + |
| 7 | +[float] |
| 8 | +=== Synopsis |
| 9 | + |
| 10 | +[source,shell] |
| 11 | +-------------------------------------------------- |
| 12 | +bin/elasticsearch-node unsafe-bootstrap|detach-cluster |
| 13 | + [--ordinal <Integer>] [-E <KeyValuePair>] |
| 14 | + [-h, --help] ([-s, --silent] | [-v, --verbose]) |
| 15 | +-------------------------------------------------- |
| 16 | + |
| 17 | +[float] |
| 18 | +=== Description |
| 19 | + |
| 20 | +Sometimes {es} nodes are temporarily stopped, perhaps because of the need to |
| 21 | +perform some maintenance activity or perhaps because of a hardware failure. |
| 22 | +After you resolve the temporary condition and restart the node, |
| 23 | +it will rejoin the cluster and continue normally. Depending on your |
| 24 | +configuration, your cluster may be able to remain completely available even |
| 25 | +while one or more of its nodes are stopped. |
| 26 | + |
| 27 | +Sometimes it might not be possible to restart a node after it has stopped. For |
| 28 | +example, the node's host may suffer from a hardware problem that cannot be |
| 29 | +repaired. If the cluster is still available then you can start up a fresh node |
| 30 | +on another host and {es} will bring this node into the cluster in place of the |
| 31 | +failed node. |
| 32 | + |
| 33 | +Each node stores its data in the data directories defined by the |
| 34 | +<<path-settings,`path.data` setting>>. This means that in a disaster you can |
| 35 | +also restart a node by moving its data directories to another host, presuming |
| 36 | +that those data directories can be recovered from the faulty host. |
| 37 | + |
| 38 | +{es} <<modules-discovery-quorums,requires a response from a majority of the |
| 39 | +master-eligible nodes>> in order to elect a master and to update the cluster |
| 40 | +state. This means that if you have three master-eligible nodes then the cluster |
| 41 | +will remain available even if one of them has failed. However if two of the |
| 42 | +three master-eligible nodes fail then the cluster will be unavailable until at |
| 43 | +least one of them is restarted. |
| 44 | + |
| 45 | +In very rare circumstances it may not be possible to restart enough nodes to |
| 46 | +restore the cluster's availability. If such a disaster occurs, you should |
| 47 | +build a new cluster from a recent snapshot and re-import any data that was |
| 48 | +ingested since that snapshot was taken. |
| 49 | + |
| 50 | +However, if the disaster is serious enough then it may not be possible to |
| 51 | +recover from a recent snapshot either. Unfortunately in this case there is no |
| 52 | +way forward that does not risk data loss, but it may be possible to use the |
| 53 | +`elasticsearch-node` tool to construct a new cluster that contains some of the |
| 54 | +data from the failed cluster. |
| 55 | + |
| 56 | +This tool has two modes: |
| 57 | + |
| 58 | +* `elastisearch-node unsafe-bootstap` can be used if there is at least one |
| 59 | + remaining master-eligible node. It forces one of the remaining nodes to form |
| 60 | + a brand-new cluster on its own, using its local copy of the cluster metadata. |
| 61 | + This is known as _unsafe cluster bootstrapping_. |
| 62 | + |
| 63 | +* `elastisearch-node detach-cluster` enables you to move nodes from one cluster |
| 64 | + to another. This can be used to move nodes into the new cluster created with |
| 65 | + the `elastisearch-node unsafe-bootstap` command. If unsafe cluster bootstrapping was not |
| 66 | + possible, it also enables you to |
| 67 | + move nodes into a brand-new cluster. |
| 68 | + |
| 69 | +[[node-tool-unsafe-bootstrap]] |
| 70 | +[float] |
| 71 | +==== Unsafe cluster bootstrapping |
| 72 | + |
| 73 | +If there is at least one remaining master-eligible node, but it is not possible |
| 74 | +to restart a majority of them, then the `elasticsearch-node unsafe-bootstrap` |
| 75 | +command will unsafely override the cluster's <<modules-discovery-voting,voting |
| 76 | +configuration>> as if performing another |
| 77 | +<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>. |
| 78 | +The target node can then form a new cluster on its own by using |
| 79 | +the cluster metadata held locally on the target node. |
| 80 | + |
| 81 | +[WARNING] |
| 82 | +These steps can lead to arbitrary data loss since the target node may not hold the latest cluster |
| 83 | +metadata, and this out-of-date metadata may make it impossible to use some or |
| 84 | +all of the indices in the cluster. |
| 85 | + |
| 86 | +Since unsafe bootstrapping forms a new cluster containing a single node, once |
| 87 | +you have run it you must use the <<node-tool-detach-cluster,`elasticsearch-node |
| 88 | +detach-cluster` tool>> to migrate any other surviving nodes from the failed |
| 89 | +cluster into this new cluster. |
| 90 | + |
| 91 | +When you run the `elasticsearch-node unsafe-bootstrap` tool it will analyse the |
| 92 | +state of the node and ask for confirmation before taking any action. Before |
| 93 | +asking for confirmation it reports the term and version of the cluster state on |
| 94 | +the node on which it runs as follows: |
| 95 | + |
| 96 | +[source,txt] |
| 97 | +---- |
| 98 | +Current node cluster state (term, version) pair is (4, 12) |
| 99 | +---- |
| 100 | + |
| 101 | +If you have a choice of nodes on which to run this tool then you should choose |
| 102 | +one with a term that is as large as possible. If there is more than one |
| 103 | +node with the same term, pick the one with the largest version. |
| 104 | +This information identifies the node with the freshest cluster state, which minimizes the |
| 105 | +quantity of data that might be lost. For example, if the first node reports |
| 106 | +`(4, 12)` and a second node reports `(5, 3)`, then the second node is preferred |
| 107 | +since its term is larger. However if the second node reports `(3, 17)` then |
| 108 | +the first node is preferred since its term is larger. If the second node |
| 109 | +reports `(4, 10)` then it has the same term as the first node, but has a |
| 110 | +smaller version, so the first node is preferred. |
| 111 | + |
| 112 | +[WARNING] |
| 113 | +Running this command can lead to arbitrary data loss. Only run this tool if you |
| 114 | +understand and accept the possible consequences and have exhausted all other |
| 115 | +possibilities for recovery of your cluster. |
| 116 | + |
| 117 | +The sequence of operations for using this tool are as follows: |
| 118 | + |
| 119 | +1. Make sure you have really lost access to at least half of the |
| 120 | +master-eligible nodes in the cluster, and they cannot be repaired or recovered |
| 121 | +by moving their data paths to healthy hardware. |
| 122 | +2. Stop **all** remaining nodes. |
| 123 | +3. Choose one of the remaining master-eligible nodes to become the new elected |
| 124 | +master as described above. |
| 125 | +4. On this node, run the `elasticsearch-node unsafe-bootstrap` command as shown |
| 126 | +below. Verify that the tool reported `Master node was successfully |
| 127 | +bootstrapped`. |
| 128 | +5. Start this node and verify that it is elected as the master node. |
| 129 | +6. Run the <<node-tool-detach-cluster,`elasticsearch-node detach-cluster` |
| 130 | +tool>>, described below, on every other node in the cluster. |
| 131 | +7. Start all other nodes and verify that each one joins the cluster. |
| 132 | +8. Investigate the data in the cluster to discover if any was lost during this |
| 133 | +process. |
| 134 | + |
| 135 | +When you run the tool it will make sure that the node that is being used to |
| 136 | +bootstrap the cluster is not running. It is important that all other |
| 137 | +master-eligible nodes are also stopped while this tool is running, but the tool |
| 138 | +does not check this. |
| 139 | + |
| 140 | +The message `Master node was successfully bootstrapped` does not mean that |
| 141 | +there has been no data loss, it just means that tool was able to complete its |
| 142 | +job. |
| 143 | + |
| 144 | +[[node-tool-detach-cluster]] |
| 145 | +[float] |
| 146 | +==== Detaching nodes from their cluster |
| 147 | + |
| 148 | +It is unsafe for nodes to move between clusters, because different clusters |
| 149 | +have completely different cluster metadata. There is no way to safely merge the |
| 150 | +metadata from two clusters together. |
| 151 | + |
| 152 | +To protect against inadvertently joining the wrong cluster, each cluster |
| 153 | +creates a unique identifier, known as the _cluster UUID_, when it first starts |
| 154 | +up. Every node records the UUID of its cluster and refuses to join a |
| 155 | +cluster with a different UUID. |
| 156 | + |
| 157 | +However, if a node's cluster has permanently failed then it may be desirable to |
| 158 | +try and move it into a new cluster. The `elasticsearch-node detach-cluster` |
| 159 | +command lets you detach a node from its cluster by resetting its cluster UUID. |
| 160 | +It can then join another cluster with a different UUID. |
| 161 | + |
| 162 | +For example, after unsafe cluster bootstrapping you will need to detach all the |
| 163 | +other surviving nodes from their old cluster so they can join the new, |
| 164 | +unsafely-bootstrapped cluster. |
| 165 | + |
| 166 | +Unsafe cluster bootstrapping is only possible if there is at least one |
| 167 | +surviving master-eligible node. If there are no remaining master-eligible nodes |
| 168 | +then the cluster metadata is completely lost. However, the individual data |
| 169 | +nodes also contain a copy of the index metadata corresponding with their |
| 170 | +shards. This sometimes allows a new cluster to import these shards as |
| 171 | +<<modules-gateway-dangling-indices,dangling indices>>. You can sometimes |
| 172 | +recover some indices after the loss of all master-eligible nodes in a cluster |
| 173 | +by creating a new cluster and then using the `elasticsearch-node |
| 174 | +detach-cluster` command to move any surviving nodes into this new cluster. |
| 175 | + |
| 176 | +There is a risk of data loss when importing a dangling index because data nodes |
| 177 | +may not have the most recent copy of the index metadata and do not have any |
| 178 | +information about <<docs-replication,which shard copies are in-sync>>. This |
| 179 | +means that a stale shard copy may be selected to be the primary, and some of |
| 180 | +the shards may be incompatible with the imported mapping. |
| 181 | + |
| 182 | +[WARNING] |
| 183 | +Execution of this command can lead to arbitrary data loss. Only run this tool |
| 184 | +if you understand and accept the possible consequences and have exhausted all |
| 185 | +other possibilities for recovery of your cluster. |
| 186 | + |
| 187 | +The sequence of operations for using this tool are as follows: |
| 188 | + |
| 189 | +1. Make sure you have really lost access to every one of the master-eligible |
| 190 | +nodes in the cluster, and they cannot be repaired or recovered by moving their |
| 191 | +data paths to healthy hardware. |
| 192 | +2. Start a new cluster and verify that it is healthy. This cluster may comprise |
| 193 | +one or more brand-new master-eligible nodes, or may be an unsafely-bootstrapped |
| 194 | +cluster formed as described above. |
| 195 | +3. Stop **all** remaining data nodes. |
| 196 | +4. On each data node, run the `elasticsearch-node detach-cluster` tool as shown |
| 197 | +below. Verify that the tool reported `Node was successfully detached from the |
| 198 | +cluster`. |
| 199 | +5. If necessary, configure each data node to |
| 200 | +<<modules-discovery-hosts-providers,discover the new cluster>>. |
| 201 | +6. Start each data node and verify that it has joined the new cluster. |
| 202 | +7. Wait for all recoveries to have completed, and investigate the data in the |
| 203 | +cluster to discover if any was lost during this process. |
| 204 | + |
| 205 | +The message `Node was successfully detached from the cluster` does not mean |
| 206 | +that there has been no data loss, it just means that tool was able to complete |
| 207 | +its job. |
| 208 | + |
| 209 | +[float] |
| 210 | +=== Parameters |
| 211 | + |
| 212 | +`unsafe-bootstrap`:: Specifies to unsafely bootstrap this node as a new |
| 213 | +one-node cluster. |
| 214 | + |
| 215 | +`detach-cluster`:: Specifies to unsafely detach this node from its cluster so |
| 216 | +it can join a different cluster. |
| 217 | + |
| 218 | +`--ordinal <Integer>`:: If there is <<max-local-storage-nodes,more than one |
| 219 | +node sharing a data path>> then this specifies which node to target. Defaults |
| 220 | +to `0`, meaning to use the first node in the data path. |
| 221 | + |
| 222 | +`-E <KeyValuePair>`:: Configures a setting. |
| 223 | + |
| 224 | +`-h, --help`:: Returns all of the command parameters. |
| 225 | + |
| 226 | +`-s, --silent`:: Shows minimal output. |
| 227 | + |
| 228 | +`-v, --verbose`:: Shows verbose output. |
| 229 | + |
| 230 | +[float] |
| 231 | +=== Examples |
| 232 | + |
| 233 | +[float] |
| 234 | +==== Unsafe cluster bootstrapping |
| 235 | + |
| 236 | +Suppose your cluster had five master-eligible nodes and you have permanently |
| 237 | +lost three of them, leaving two nodes remaining. |
| 238 | + |
| 239 | +* Run the tool on the first remaining node, but answer `n` at the confirmation |
| 240 | + step. |
| 241 | + |
| 242 | +[source,txt] |
| 243 | +---- |
| 244 | +node_1$ ./bin/elasticsearch-node unsafe-bootstrap |
| 245 | +
|
| 246 | + WARNING: Elasticsearch MUST be stopped before running this tool. |
| 247 | +
|
| 248 | +Current node cluster state (term, version) pair is (4, 12) |
| 249 | +
|
| 250 | +You should only run this tool if you have permanently lost half or more |
| 251 | +of the master-eligible nodes in this cluster, and you cannot restore the |
| 252 | +cluster from a snapshot. This tool can cause arbitrary data loss and its |
| 253 | +use should be your last resort. If you have multiple surviving master |
| 254 | +eligible nodes, you should run this tool on the node with the highest |
| 255 | +cluster state (term, version) pair. |
| 256 | +
|
| 257 | +Do you want to proceed? |
| 258 | +
|
| 259 | +Confirm [y/N] n |
| 260 | +---- |
| 261 | + |
| 262 | +* Run the tool on the second remaining node, and again answer `n` at the |
| 263 | + confirmation step. |
| 264 | + |
| 265 | +[source,txt] |
| 266 | +---- |
| 267 | +node_2$ ./bin/elasticsearch-node unsafe-bootstrap |
| 268 | +
|
| 269 | + WARNING: Elasticsearch MUST be stopped before running this tool. |
| 270 | +
|
| 271 | +Current node cluster state (term, version) pair is (5, 3) |
| 272 | +
|
| 273 | +You should only run this tool if you have permanently lost half or more |
| 274 | +of the master-eligible nodes in this cluster, and you cannot restore the |
| 275 | +cluster from a snapshot. This tool can cause arbitrary data loss and its |
| 276 | +use should be your last resort. If you have multiple surviving master |
| 277 | +eligible nodes, you should run this tool on the node with the highest |
| 278 | +cluster state (term, version) pair. |
| 279 | +
|
| 280 | +Do you want to proceed? |
| 281 | +
|
| 282 | +Confirm [y/N] n |
| 283 | +---- |
| 284 | + |
| 285 | +* Since the second node has a greater term it has a fresher cluster state, so |
| 286 | + it is better to unsafely bootstrap the cluster using this node: |
| 287 | + |
| 288 | +[source,txt] |
| 289 | +---- |
| 290 | +node_2$ ./bin/elasticsearch-node unsafe-bootstrap |
| 291 | +
|
| 292 | + WARNING: Elasticsearch MUST be stopped before running this tool. |
| 293 | +
|
| 294 | +Current node cluster state (term, version) pair is (5, 3) |
| 295 | +
|
| 296 | +You should only run this tool if you have permanently lost half or more |
| 297 | +of the master-eligible nodes in this cluster, and you cannot restore the |
| 298 | +cluster from a snapshot. This tool can cause arbitrary data loss and its |
| 299 | +use should be your last resort. If you have multiple surviving master |
| 300 | +eligible nodes, you should run this tool on the node with the highest |
| 301 | +cluster state (term, version) pair. |
| 302 | +
|
| 303 | +Do you want to proceed? |
| 304 | +
|
| 305 | +Confirm [y/N] y |
| 306 | +Master node was successfully bootstrapped |
| 307 | +---- |
| 308 | + |
| 309 | +[float] |
| 310 | +==== Detaching nodes from their cluster |
| 311 | + |
| 312 | +After unsafely bootstrapping a new cluster, run the `elasticsearch-node |
| 313 | +detach-cluster` command to detach all remaining nodes from the failed cluster |
| 314 | +so they can join the new cluster: |
| 315 | + |
| 316 | +[source, txt] |
| 317 | +---- |
| 318 | +node_3$ ./bin/elasticsearch-node detach-cluster |
| 319 | +
|
| 320 | + WARNING: Elasticsearch MUST be stopped before running this tool. |
| 321 | +
|
| 322 | +You should only run this tool if you have permanently lost all of the |
| 323 | +master-eligible nodes in this cluster and you cannot restore the cluster |
| 324 | +from a snapshot, or you have already unsafely bootstrapped a new cluster |
| 325 | +by running `elasticsearch-node unsafe-bootstrap` on a master-eligible |
| 326 | +node that belonged to the same cluster as this node. This tool can cause |
| 327 | +arbitrary data loss and its use should be your last resort. |
| 328 | +
|
| 329 | +Do you want to proceed? |
| 330 | +
|
| 331 | +Confirm [y/N] y |
| 332 | +Node was successfully detached from the cluster |
| 333 | +---- |
| 334 | + |
0 commit comments