Skip to content

Discontinue archiving broken cluster settings #28253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/reference/migration/migrate_7_0/cluster.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,9 @@ primary shards of the opened index to be allocated.

==== Shard preferences `_primary`, `_primary_first`, `_replica`, and `_replica_first` are removed
These shard preferences are removed in favour of the `_prefer_nodes` and `_only_nodes` preferences.

==== Discontinue archiving broken cluster settings
We will no longer archive unknown or invalid cluster settings (prepending "archived." to a broken setting's name).
Instead, we will fail to recover a cluster state with broken cluster settings.
The solution for users in an upgrade case would be to rollback to the previous version,
address the settings that would be unknown or invalid in the next major version, and then proceed with the upgrade.
Original file line number Diff line number Diff line change
Expand Up @@ -681,6 +681,44 @@ public Settings archiveUnknownOrInvalidSettings(
}
}


/**
* Checks invalid or unknown settings. Any setting that is not recognized or fails validation
* will be processed by consumers.
* An exception will be thrown if any invalid or unknown setting is found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please mention the exact type that is thrown (IllegalStateException)?

*
* @param settings the {@link Settings} instance to scan for unknown or invalid settings
* @param unknownConsumer callback on unknown settings (consumer receives unknown key and its
* associated value)
* @param invalidConsumer callback on invalid settings (consumer receives invalid key, its
* associated value and an exception)
*/
public void checkUnknownOrInvalidSettings(
final Settings settings,
final Consumer<Map.Entry<String, String>> unknownConsumer,
final BiConsumer<Map.Entry<String, String>, IllegalArgumentException> invalidConsumer) {
List<String> failedKeys = new ArrayList<>();
for (String key : settings.keySet()) {
try {
Setting<?> setting = get(key);
if (setting != null) {
setting.get(settings);
} else {
if (!isPrivateSetting(key)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We usually use the x == false idiom instead of !x.

failedKeys.add(key);
unknownConsumer.accept(new Entry(key, settings));
}
}
} catch (IllegalArgumentException ex) {
failedKeys.add(key);
invalidConsumer.accept(new Entry(key, settings), ex);
}
}
if (failedKeys.size() > 0) {
throw new IllegalStateException("Invalid or unknown settings: " + String.join(", ", failedKeys));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this error will occur during an upgrade which is a high stress scenario anyway and the mitigation is not entirely straightforward, I think it would be good if we could provide specific guidance how to address this problem? Otherwise, people would need to turn to the migration guide for 7.0 and check the changes to find out how to handle this?

}
}

private static final class Entry implements Map.Entry<String, String> {

private final String key;
Expand Down
22 changes: 10 additions & 12 deletions server/src/main/java/org/elasticsearch/gateway/Gateway.java
Original file line number Diff line number Diff line change
Expand Up @@ -137,27 +137,25 @@ public void performStateRecovery(final GatewayStateRecoveredListener listener) t
}
}
final ClusterSettings clusterSettings = clusterService.getClusterSettings();
metaDataBuilder.persistentSettings(
clusterSettings.archiveUnknownOrInvalidSettings(
metaDataBuilder.persistentSettings(),
e -> logUnknownSetting("persistent", e),
(e, ex) -> logInvalidSetting("persistent", e, ex)));
metaDataBuilder.transientSettings(
clusterSettings.archiveUnknownOrInvalidSettings(
metaDataBuilder.transientSettings(),
e -> logUnknownSetting("transient", e),
(e, ex) -> logInvalidSetting("transient", e, ex)));
clusterSettings.checkUnknownOrInvalidSettings(
metaDataBuilder.persistentSettings(),
e -> logUnknownSetting("persistent", e),
(e, ex) -> logInvalidSetting("persistent", e, ex));
clusterSettings.checkUnknownOrInvalidSettings(
metaDataBuilder.transientSettings(),
e -> logUnknownSetting("transient", e),
(e, ex) -> logInvalidSetting("transient", e, ex));
ClusterState.Builder builder = clusterService.newClusterStateBuilder();
builder.metaData(metaDataBuilder);
listener.onSuccess(builder.build());
}

private void logUnknownSetting(String settingType, Map.Entry<String, String> e) {
logger.warn("ignoring unknown {} setting: [{}] with value [{}]; archiving", settingType, e.getKey(), e.getValue());
logger.warn("unknown {} setting: [{}] with value [{}]", settingType, e.getKey(), e.getValue());
}

private void logInvalidSetting(String settingType, Map.Entry<String, String> e, IllegalArgumentException ex) {
logger.warn(() -> new ParameterizedMessage("ignoring invalid {} setting: [{}] with value [{}]; archiving",
logger.warn(() -> new ParameterizedMessage("invalid {} setting: [{}] with value [{}]",
settingType, e.getKey(), e.getValue()), ex);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@
import org.elasticsearch.test.ESIntegTestCase.ClusterScope;
import org.elasticsearch.test.ESIntegTestCase.Scope;
import org.elasticsearch.test.InternalTestCluster.RestartCallback;
import org.junit.Rule;
import org.junit.rules.ExpectedException;

import java.io.IOException;
import java.util.List;
Expand All @@ -64,6 +66,8 @@
@ClusterScope(scope = Scope.TEST, numDataNodes = 0)
public class GatewayIndexStateIT extends ESIntegTestCase {

@Rule
public ExpectedException expectedException = ExpectedException.none();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying that this is wrong but it seems we don't use it (so far?) in our code base. For consistency I'd probably check for expected exceptions as we did before but maybe we want to use ExpectedException more broadly in the future?

private final Logger logger = Loggers.getLogger(GatewayIndexStateIT.class);

public void testMappingMetaDataParsed() throws Exception {
Expand Down Expand Up @@ -479,7 +483,7 @@ public void testRecoverMissingAnalyzer() throws Exception {
assertThat(ex.getCause().getMessage(), containsString("analyzer [test] not found for field [field1]"));
}

public void testArchiveBrokenClusterSettings() throws Exception {
public void testFailBrokenClusterSettings() throws Exception {
logger.info("--> starting one node");
internalCluster().startNode();
client().prepareIndex("test", "type1", "1").setSource("field1", "value1").setRefreshPolicy(IMMEDIATE).get();
Expand All @@ -502,20 +506,9 @@ public void testArchiveBrokenClusterSettings() throws Exception {
.put(ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey(), "broken").build()).build();
MetaData.FORMAT.write(brokenMeta, nodeEnv.nodeDataPaths());
}
internalCluster().fullRestart();
ensureYellow("test"); // wait for state recovery
state = client().admin().cluster().prepareState().get().getState();
assertEquals("true", state.metaData().persistentSettings().get("archived.this.is.unknown"));
assertEquals("broken", state.metaData().persistentSettings().get("archived."
+ ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey()));

// delete these settings
client().admin().cluster().prepareUpdateSettings().setPersistentSettings(Settings.builder().putNull("archived.*")).get();

state = client().admin().cluster().prepareState().get().getState();
assertNull(state.metaData().persistentSettings().get("archived.this.is.unknown"));
assertNull(state.metaData().persistentSettings().get("archived."
+ ElectMasterService.DISCOVERY_ZEN_MINIMUM_MASTER_NODES_SETTING.getKey()));
assertHitCount(client().prepareSearch().setQuery(matchAllQuery()).get(), 1L);
expectedException.expect(ElasticsearchException.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do a deeper inspection of the underlying cause and maybe even the error message (at least whether it matches a certain pattern)?

internalCluster().fullRestart();
}

}