Skip to content

Commit 22ba759

Browse files
ywelschDaveCTurner
andauthored
Move metadata storage to Lucene (#50928)
* Move metadata storage to Lucene (#50907) Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates #48701 * Remove per-index metadata without assigned shards (#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates #48701 * Use Lucene exclusively for metadata storage (#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates #48701 * Add command-line tool support for Lucene-based metadata storage (#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates #48701 * Use single directory for metadata (#50639) Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <[email protected]> * Add async dangling indices support (#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates #48701 * Fold node metadata into new node storage (#50741) Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source. * Write CS asynchronously on data-only nodes (#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates #48701 * Remove persistent cluster settings tool (#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates #48701 * Make cluster state writer resilient to disk issues (#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701 * Omit writing global metadata if no change (#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates #48701 * DanglingIndicesIT should ensure node removed first (#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <[email protected]> * fix tests Co-authored-by: David Turner <[email protected]>
1 parent b02b073 commit 22ba759

File tree

54 files changed

+3440
-976
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+3440
-976
lines changed

docs/reference/commands/node-tool.asciidoc

+67-6
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33

44
The `elasticsearch-node` command enables you to perform certain unsafe
55
operations on a node that are only possible while it is shut down. This command
6-
allows you to adjust the <<modules-node,role>> of a node and may be able to
7-
recover some data after a disaster or start a node even if it is incompatible
8-
with the data on disk.
6+
allows you to adjust the <<modules-node,role>> of a node, unsafely edit cluster
7+
settings and may be able to recover some data after a disaster or start a node
8+
even if it is incompatible with the data on disk.
99

1010
[float]
1111
=== Synopsis
@@ -20,13 +20,17 @@ bin/elasticsearch-node repurpose|unsafe-bootstrap|detach-cluster|override-versio
2020
[float]
2121
=== Description
2222

23-
This tool has four modes:
23+
This tool has five modes:
2424

2525
* `elasticsearch-node repurpose` can be used to delete unwanted data from a
2626
node if it used to be a <<data-node,data node>> or a
2727
<<master-node,master-eligible node>> but has been repurposed not to have one
2828
or other of these roles.
2929

30+
* `elasticsearch-node remove-settings` can be used to remove persistent settings
31+
from the cluster state in case where it contains incompatible settings that
32+
prevent the cluster from forming.
33+
3034
* `elasticsearch-node unsafe-bootstrap` can be used to perform _unsafe cluster
3135
bootstrapping_. It forces one of the nodes to form a brand-new cluster on
3236
its own, using its local copy of the cluster metadata.
@@ -76,6 +80,26 @@ The tool provides a summary of the data to be deleted and asks for confirmation
7680
before making any changes. You can get detailed information about the affected
7781
indices and shards by passing the verbose (`-v`) option.
7882

83+
[float]
84+
==== Removing persistent cluster settings
85+
86+
There may be situations where a node contains persistent cluster
87+
settings that prevent the cluster from forming. Since the cluster cannot form,
88+
it is not possible to remove these settings using the
89+
<<cluster-update-settings>> API.
90+
91+
The `elasticsearch-node remove-settings` tool allows you to forcefully remove
92+
those persistent settings from the on-disk cluster state. The tool takes a
93+
list of settings as parameters that should be removed, and also supports
94+
wildcard patterns.
95+
96+
The intended use is:
97+
98+
* Stop the node
99+
* Run `elasticsearch-node remove-settings name-of-setting-to-remove` on the node
100+
* Repeat for all other master-eligible nodes
101+
* Start the nodes
102+
79103
[float]
80104
==== Recovering data after a disaster
81105

@@ -143,9 +167,9 @@ If there is at least one remaining master-eligible node, but it is not possible
143167
to restart a majority of them, then the `elasticsearch-node unsafe-bootstrap`
144168
command will unsafely override the cluster's <<modules-discovery-voting,voting
145169
configuration>> as if performing another
146-
<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
170+
<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
147171
The target node can then form a new cluster on its own by using
148-
the cluster metadata held locally on the target node.
172+
the cluster metadata held locally on the target node.
149173

150174
[WARNING]
151175
These steps can lead to arbitrary data loss since the target node may not hold the latest cluster
@@ -290,6 +314,9 @@ it can join a different cluster.
290314
`override-version`:: Overwrites the version number stored in the data path so
291315
that a node can start despite being incompatible with the on-disk data.
292316

317+
`remove-settings`:: Forcefully removes the provided persistent cluster settings
318+
from the on-disk cluster state.
319+
293320
`--ordinal <Integer>`:: If there is <<max-local-storage-nodes,more than one
294321
node sharing a data path>> then this specifies which node to target. Defaults
295322
to `0`, meaning to use the first node in the data path.
@@ -350,6 +377,40 @@ Confirm [y/N] y
350377
Node successfully repurposed to no-master and no-data.
351378
----
352379

380+
[float]
381+
==== Removing persistent cluster settings
382+
383+
If your nodes contain persistent cluster settings that prevent the cluster
384+
from forming, i.e., can't be removed using the <<cluster-update-settings>> API,
385+
you can run the following commands to remove one or more cluster settings.
386+
387+
[source,txt]
388+
----
389+
node$ ./bin/elasticsearch-node remove-settings xpack.monitoring.exporters.my_exporter.host
390+
391+
WARNING: Elasticsearch MUST be stopped before running this tool.
392+
393+
The following settings will be removed:
394+
xpack.monitoring.exporters.my_exporter.host: "10.1.2.3"
395+
396+
You should only run this tool if you have incompatible settings in the
397+
cluster state that prevent the cluster from forming.
398+
This tool can cause data loss and its use should be your last resort.
399+
400+
Do you want to proceed?
401+
402+
Confirm [y/N] y
403+
404+
Settings were successfully removed from the cluster state
405+
----
406+
407+
You can also use wildcards to remove multiple settings, for example using
408+
409+
[source,txt]
410+
----
411+
node$ ./bin/elasticsearch-node remove-settings xpack.monitoring.*
412+
----
413+
353414
[float]
354415
==== Unsafe cluster bootstrapping
355416

server/src/main/java/org/elasticsearch/cluster/coordination/CoordinationState.java

+6-4
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@
2525
import org.elasticsearch.cluster.metadata.MetaData;
2626
import org.elasticsearch.cluster.node.DiscoveryNode;
2727

28+
import java.io.Closeable;
29+
import java.io.IOException;
2830
import java.util.Collection;
2931
import java.util.Collections;
3032
import java.util.HashMap;
@@ -444,15 +446,14 @@ public void invariant() {
444446
assert publishVotes.isEmpty() || electionWon();
445447
}
446448

447-
public void close() {
449+
public void close() throws IOException {
448450
persistedState.close();
449451
}
450452

451453
/**
452454
* Pluggable persistence layer for {@link CoordinationState}.
453-
*
454455
*/
455-
public interface PersistedState {
456+
public interface PersistedState extends Closeable {
456457

457458
/**
458459
* Returns the current term
@@ -511,7 +512,8 @@ default void markLastAcceptedStateAsCommitted() {
511512
}
512513
}
513514

514-
default void close() {}
515+
default void close() throws IOException {
516+
}
515517
}
516518

517519
/**

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

+2-1
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@
7575
import org.elasticsearch.transport.TransportResponse.Empty;
7676
import org.elasticsearch.transport.TransportService;
7777

78+
import java.io.IOException;
7879
import java.util.ArrayList;
7980
import java.util.Collection;
8081
import java.util.Collections;
@@ -732,7 +733,7 @@ protected void doStop() {
732733
}
733734

734735
@Override
735-
protected void doClose() {
736+
protected void doClose() throws IOException {
736737
final CoordinationState coordinationState = this.coordinationState.get();
737738
if (coordinationState != null) {
738739
// This looks like a race that might leak an unclosed CoordinationState if it's created while execution is here, but this method

server/src/main/java/org/elasticsearch/cluster/coordination/DetachClusterCommand.java

+16-7
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,12 @@
1818
*/
1919
package org.elasticsearch.cluster.coordination;
2020

21+
import joptsimple.OptionSet;
2122
import org.elasticsearch.cli.Terminal;
22-
import org.elasticsearch.cluster.metadata.Manifest;
23+
import org.elasticsearch.cluster.ClusterState;
2324
import org.elasticsearch.cluster.metadata.MetaData;
24-
import org.elasticsearch.common.collect.Tuple;
2525
import org.elasticsearch.env.Environment;
26+
import org.elasticsearch.gateway.PersistedClusterStateService;
2627

2728
import java.io.IOException;
2829
import java.nio.file.Path;
@@ -48,14 +49,22 @@ public DetachClusterCommand() {
4849

4950

5051
@Override
51-
protected void processNodePaths(Terminal terminal, Path[] dataPaths, Environment env) throws IOException {
52-
final Tuple<Manifest, MetaData> manifestMetaDataTuple = loadMetaData(terminal, dataPaths);
53-
final Manifest manifest = manifestMetaDataTuple.v1();
54-
final MetaData metaData = manifestMetaDataTuple.v2();
52+
protected void processNodePaths(Terminal terminal, Path[] dataPaths, int nodeLockId, OptionSet options, Environment env)
53+
throws IOException {
54+
final PersistedClusterStateService persistedClusterStateService = createPersistedClusterStateService(dataPaths);
55+
56+
terminal.println(Terminal.Verbosity.VERBOSE, "Loading cluster state");
57+
final ClusterState oldClusterState = loadTermAndClusterState(persistedClusterStateService, env).v2();
58+
final ClusterState newClusterState = ClusterState.builder(oldClusterState)
59+
.metaData(updateMetaData(oldClusterState.metaData())).build();
60+
terminal.println(Terminal.Verbosity.VERBOSE,
61+
"[old cluster state = " + oldClusterState + ", new cluster state = " + newClusterState + "]");
5562

5663
confirm(terminal, CONFIRMATION_MSG);
5764

58-
writeNewMetaData(terminal, manifest, updateCurrentTerm(), metaData, updateMetaData(metaData), dataPaths);
65+
try (PersistedClusterStateService.Writer writer = persistedClusterStateService.createWriter()) {
66+
writer.writeFullStateAndCommit(updateCurrentTerm(), newClusterState);
67+
}
5968

6069
terminal.println(NODE_DETACHED_MSG);
6170
}

server/src/main/java/org/elasticsearch/cluster/coordination/ElasticsearchNodeCommand.java

+53-58
Original file line numberDiff line numberDiff line change
@@ -27,45 +27,82 @@
2727
import org.elasticsearch.ElasticsearchException;
2828
import org.elasticsearch.cli.EnvironmentAwareCommand;
2929
import org.elasticsearch.cli.Terminal;
30-
import org.elasticsearch.cluster.metadata.Manifest;
31-
import org.elasticsearch.cluster.metadata.MetaData;
30+
import org.elasticsearch.cli.UserException;
31+
import org.elasticsearch.cluster.ClusterModule;
32+
import org.elasticsearch.cluster.ClusterName;
33+
import org.elasticsearch.cluster.ClusterState;
3234
import org.elasticsearch.common.collect.Tuple;
35+
import org.elasticsearch.common.util.BigArrays;
3336
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
3437
import org.elasticsearch.env.Environment;
3538
import org.elasticsearch.env.NodeEnvironment;
39+
import org.elasticsearch.env.NodeMetaData;
40+
import org.elasticsearch.gateway.PersistedClusterStateService;
41+
import org.elasticsearch.indices.IndicesModule;
3642

3743
import java.io.IOException;
3844
import java.nio.file.Files;
3945
import java.nio.file.Path;
4046
import java.util.Arrays;
4147
import java.util.Objects;
48+
import java.util.function.Function;
49+
import java.util.stream.Collectors;
50+
import java.util.stream.Stream;
4251

4352
public abstract class ElasticsearchNodeCommand extends EnvironmentAwareCommand {
4453
private static final Logger logger = LogManager.getLogger(ElasticsearchNodeCommand.class);
4554
protected static final String DELIMITER = "------------------------------------------------------------------------\n";
46-
4755
static final String STOP_WARNING_MSG =
4856
DELIMITER +
4957
"\n" +
5058
" WARNING: Elasticsearch MUST be stopped before running this tool." +
5159
"\n";
5260
protected static final String FAILED_TO_OBTAIN_NODE_LOCK_MSG = "failed to lock node's directory, is Elasticsearch still running?";
53-
static final String NO_NODE_FOLDER_FOUND_MSG = "no node folder is found in data folder(s), node has not been started yet?";
54-
static final String NO_MANIFEST_FILE_FOUND_MSG = "no manifest file is found, do you run pre 7.0 Elasticsearch?";
55-
protected static final String GLOBAL_GENERATION_MISSING_MSG =
56-
"no metadata is referenced from the manifest file, cluster has never been bootstrapped?";
57-
static final String NO_GLOBAL_METADATA_MSG = "failed to find global metadata, metadata corrupted?";
58-
static final String WRITE_METADATA_EXCEPTION_MSG = "exception occurred when writing new metadata to disk";
5961
protected static final String ABORTED_BY_USER_MSG = "aborted by user";
6062
final OptionSpec<Integer> nodeOrdinalOption;
63+
static final String NO_NODE_FOLDER_FOUND_MSG = "no node folder is found in data folder(s), node has not been started yet?";
64+
static final String NO_NODE_METADATA_FOUND_MSG = "no node meta data is found, node has not been started yet?";
65+
protected static final String CS_MISSING_MSG =
66+
"cluster state is empty, cluster has never been bootstrapped?";
67+
68+
protected static final NamedXContentRegistry namedXContentRegistry = new NamedXContentRegistry(
69+
Stream.of(ClusterModule.getNamedXWriteables().stream(), IndicesModule.getNamedXContents().stream())
70+
.flatMap(Function.identity())
71+
.collect(Collectors.toList()));
6172

6273
public ElasticsearchNodeCommand(String description) {
6374
super(description);
6475
nodeOrdinalOption = parser.accepts("ordinal", "Optional node ordinal, 0 if not specified")
6576
.withRequiredArg().ofType(Integer.class);
6677
}
6778

68-
protected void processNodePathsWithLock(Terminal terminal, OptionSet options, Environment env) throws IOException {
79+
public static PersistedClusterStateService createPersistedClusterStateService(Path[] dataPaths) throws IOException {
80+
final NodeMetaData nodeMetaData = PersistedClusterStateService.nodeMetaData(dataPaths);
81+
if (nodeMetaData == null) {
82+
throw new ElasticsearchException(NO_NODE_METADATA_FOUND_MSG);
83+
}
84+
85+
String nodeId = nodeMetaData.nodeId();
86+
return new PersistedClusterStateService(dataPaths, nodeId, namedXContentRegistry, BigArrays.NON_RECYCLING_INSTANCE, true);
87+
}
88+
89+
public static ClusterState clusterState(Environment environment, PersistedClusterStateService.OnDiskState onDiskState) {
90+
return ClusterState.builder(ClusterName.CLUSTER_NAME_SETTING.get(environment.settings()))
91+
.version(onDiskState.lastAcceptedVersion)
92+
.metaData(onDiskState.metaData)
93+
.build();
94+
}
95+
96+
public static Tuple<Long, ClusterState> loadTermAndClusterState(PersistedClusterStateService psf,
97+
Environment env) throws IOException {
98+
final PersistedClusterStateService.OnDiskState bestOnDiskState = psf.loadBestOnDiskState();
99+
if (bestOnDiskState.empty()) {
100+
throw new ElasticsearchException(CS_MISSING_MSG);
101+
}
102+
return Tuple.tuple(bestOnDiskState.currentTerm, clusterState(env, bestOnDiskState));
103+
}
104+
105+
protected void processNodePaths(Terminal terminal, OptionSet options, Environment env) throws IOException, UserException {
69106
terminal.println(Terminal.Verbosity.VERBOSE, "Obtaining lock for node");
70107
Integer nodeOrdinal = nodeOrdinalOption.value(options);
71108
if (nodeOrdinal == null) {
@@ -77,32 +114,12 @@ protected void processNodePathsWithLock(Terminal terminal, OptionSet options, En
77114
if (dataPaths.length == 0) {
78115
throw new ElasticsearchException(NO_NODE_FOLDER_FOUND_MSG);
79116
}
80-
processNodePaths(terminal, dataPaths, env);
117+
processNodePaths(terminal, dataPaths, nodeOrdinal, options, env);
81118
} catch (LockObtainFailedException e) {
82119
throw new ElasticsearchException(FAILED_TO_OBTAIN_NODE_LOCK_MSG, e);
83120
}
84121
}
85122

86-
protected Tuple<Manifest, MetaData> loadMetaData(Terminal terminal, Path[] dataPaths) throws IOException {
87-
terminal.println(Terminal.Verbosity.VERBOSE, "Loading manifest file");
88-
final Manifest manifest = Manifest.FORMAT.loadLatestState(logger, NamedXContentRegistry.EMPTY, dataPaths);
89-
90-
if (manifest == null) {
91-
throw new ElasticsearchException(NO_MANIFEST_FILE_FOUND_MSG);
92-
}
93-
if (manifest.isGlobalGenerationMissing()) {
94-
throw new ElasticsearchException(GLOBAL_GENERATION_MISSING_MSG);
95-
}
96-
terminal.println(Terminal.Verbosity.VERBOSE, "Loading global metadata file");
97-
final MetaData metaData = MetaData.FORMAT_PRESERVE_CUSTOMS.loadGeneration(
98-
logger, NamedXContentRegistry.EMPTY, manifest.getGlobalGeneration(), dataPaths);
99-
if (metaData == null) {
100-
throw new ElasticsearchException(NO_GLOBAL_METADATA_MSG + " [generation = " + manifest.getGlobalGeneration() + "]");
101-
}
102-
103-
return Tuple.tuple(manifest, metaData);
104-
}
105-
106123
protected void confirm(Terminal terminal, String msg) {
107124
terminal.println(msg);
108125
String text = terminal.readText("Confirm [y/N] ");
@@ -112,10 +129,10 @@ protected void confirm(Terminal terminal, String msg) {
112129
}
113130

114131
@Override
115-
protected final void execute(Terminal terminal, OptionSet options, Environment env) throws Exception {
132+
public final void execute(Terminal terminal, OptionSet options, Environment env) throws Exception {
116133
terminal.println(STOP_WARNING_MSG);
117134
if (validateBeforeLock(terminal, env)) {
118-
processNodePathsWithLock(terminal, options, env);
135+
processNodePaths(terminal, options, env);
119136
}
120137
}
121138

@@ -134,33 +151,11 @@ protected boolean validateBeforeLock(Terminal terminal, Environment env) {
134151
* Process the paths. Locks for the paths is held during this method invocation.
135152
* @param terminal the terminal to use for messages
136153
* @param dataPaths the paths of the node to process
154+
* @param options the command line options
137155
* @param env the env of the node to process
138156
*/
139-
protected abstract void processNodePaths(Terminal terminal, Path[] dataPaths, Environment env) throws IOException;
140-
141-
142-
protected void writeNewMetaData(Terminal terminal, Manifest oldManifest, long newCurrentTerm,
143-
MetaData oldMetaData, MetaData newMetaData, Path[] dataPaths) {
144-
try {
145-
terminal.println(Terminal.Verbosity.VERBOSE,
146-
"[clusterUUID = " + oldMetaData.clusterUUID() + ", committed = " + oldMetaData.clusterUUIDCommitted() + "] => " +
147-
"[clusterUUID = " + newMetaData.clusterUUID() + ", committed = " + newMetaData.clusterUUIDCommitted() + "]");
148-
terminal.println(Terminal.Verbosity.VERBOSE, "New coordination metadata is " + newMetaData.coordinationMetaData());
149-
terminal.println(Terminal.Verbosity.VERBOSE, "Writing new global metadata to disk");
150-
long newGeneration = MetaData.FORMAT.write(newMetaData, dataPaths);
151-
Manifest newManifest = new Manifest(newCurrentTerm, oldManifest.getClusterStateVersion(), newGeneration,
152-
oldManifest.getIndexGenerations());
153-
terminal.println(Terminal.Verbosity.VERBOSE, "New manifest is " + newManifest);
154-
terminal.println(Terminal.Verbosity.VERBOSE, "Writing new manifest file to disk");
155-
Manifest.FORMAT.writeAndCleanup(newManifest, dataPaths);
156-
terminal.println(Terminal.Verbosity.VERBOSE, "Cleaning up old metadata");
157-
MetaData.FORMAT.cleanupOldFiles(newGeneration, dataPaths);
158-
} catch (Exception e) {
159-
terminal.println(Terminal.Verbosity.VERBOSE, "Cleaning up new metadata");
160-
MetaData.FORMAT.cleanupOldFiles(oldManifest.getGlobalGeneration(), dataPaths);
161-
throw new ElasticsearchException(WRITE_METADATA_EXCEPTION_MSG, e);
162-
}
163-
}
157+
protected abstract void processNodePaths(Terminal terminal, Path[] dataPaths, int nodeLockId, OptionSet options, Environment env)
158+
throws IOException, UserException;
164159

165160
protected NodeEnvironment.NodePath[] toNodePaths(Path[] dataPaths) {
166161
return Arrays.stream(dataPaths).map(ElasticsearchNodeCommand::createNodePath).toArray(NodeEnvironment.NodePath[]::new);

0 commit comments

Comments
 (0)