Skip to content

ILM: Make all the shrink action steps retryable #70107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Mar 18, 2021

Conversation

andreidan
Copy link
Contributor

@andreidan andreidan commented Mar 8, 2021

This aims at making the shrink action retryable. Every step is
retryable, but in order to provide an experience where ILM tries
to achieve a successful shrink even when the target node goes
missing permanently or the shrunk index cannot recover, this also
introduces a retryable shrink cycle within the shrink action.

The shrink action will generate a unique index name that'll be the
shrunk index name. The generated index name is stored in the lifecycle
state.

If the shrink action ends up waiting for the source shards to
collocate or for the shrunk index to recover for more than the configured
LIFECYCLE_STEP_WAIT_TIME_THRESHOLD setting, it will move back
to clean up the attempted (and failed) shrunk index and will retry
generating a new index name and attempting to shrink the source
to the newly generated index name.

Relates to #48183

This aims at making the shrink action retryable. Every step is
retryably, but in order to provide an experience where ILM tries
to achieve a successful shrink even when the target node goes
missing permanently or the shrunk index cannot recover, this also
introduces a retryable shrink cycle within the shrink action.

The shrink action will generate a unique index name that'll be the
shrunk index name. The generated index name is stored in the lifecycle
state.

If the shrink action ends up waiting for the source shards to
colocate or for the shrunk index to recover for more than the configured
`LIFECYCLE_STEP_WAIT_TIME_THRESHOLD` setting, it will move back
to cleanup the attempted (and failed) shrunk index and will retry
generating a new index name and attempting to shrink the source
to the newly generated index name.
@andreidan andreidan added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Mar 8, 2021
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

elasticmachine and others added 2 commits March 9, 2021 02:50
We were setting the nextStepKey regardless if the underlying step
execution was successful or not. We were also asserting we have a
`nextStepKey` even if the step failed (and moved into the `ERROR` step).

I don't think this is correct as from the `ERROR` step the only place
ILM can move is back to the `failedStep` (if it was retryable).
@andreidan andreidan marked this pull request as ready for review March 9, 2021 09:06
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Mar 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@andreidan andreidan requested a review from dakrone March 9, 2021 09:07
// This setting configures how much time since step_time should ILM wait for a condition to be met. After the threshold wait time has
// elapsed ILM will likely stop waiting and go to the next step.
// Also see {@link org.elasticsearch.xpack.core.ilm.ClusterStateWaitUntilThresholdStep}
public static final Setting<TimeValue> LIFECYCLE_STEP_WAIT_TIME_THRESHOLD_SETTING =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to document this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, even if only in the shrink action documentation.

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Andrei! I left some comments for an initial review.

// wonderful thing)
TimeValue retryThreshold = LifecycleSettings.LIFECYCLE_STEP_WAIT_TIME_THRESHOLD_SETTING.get(idxMeta.getSettings());
LifecycleExecutionState lifecycleState = fromIndexMetadata(idxMeta);
if (waitedMoreThanThresholdLevel(retryThreshold, lifecycleState, Clock.systemUTC())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly concerned that using Clock.systemUTC() here means that we'll be subject to drift as the time on the underlying machine changes, but to fix that we'd have to use System.nanoTime() which would be a little strange since it's not necessarily "real" time.

(not really something we should change, I'm just voicing my concern)

Copy link
Contributor Author

@andreidan andreidan Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're absolutely right - wall clocks are not trustworthy. I don't think nanoTime would be the answer as the reading invariant on that api is to be monotonically increasing, more so than accuracy.

I don't think this is too bad because if we get a reading where the time drift is hours away we'll retry the shrink cycle (so nothing truly bad happens, except a wasteful - and presumbaly eventually successful - shrink operation, with the note that we'll at least delete the abandoned shrunk index)

@@ -38,7 +38,7 @@

public static final String NAME = "generate-snapshot-name";

private static final Logger logger = LogManager.getLogger(CreateSnapshotStep.class);
private static final Logger logger = LogManager.getLogger(GenerateSnapshotNameStep.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch :)

String generatedIndexName = generateValidIndexName(prefix, index.getName());
ActionRequestValidationException validationException = validateGeneratedIndexName(generatedIndexName, clusterState);
if (validationException != null) {
logger.warn("unable to generate a valid shrink index name as part of policy [{}] for index [{}] due to [{}]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the term "shrink" in here? I was thinking we may want to end up re-using this for other purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good catch

Comment on lines 140 to 141
static String generateValidIndexName(String prefix, String indexName) {
return prefix + indexName + "-" + generateValidIndexSuffix(() -> UUIDs.randomBase64UUID().toLowerCase(Locale.ROOT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want a suffix on the index, it should be

<prefix>-<uuid>-<indexName>

The reason is that if we add any suffixes, then we break origination date parsing for anyone that has index.lifecycle.parse_origination_date set for their indices.

I also think that we don't need a full UUID, which can be quite long! Imagine if we end up using this logic for all our prefixing steps and we'd end up with:

partial-jrx_xoiprjuncgvs5vmfmq-restored-krx_xoiprjuncgvs5vmfmq-shrink-lrx_xoiprjuncgvs5vmfmq-rollup-msy_xoiprjuncgvs5vmfmq-.ds-myindex-2021.03.11-000001

Which is super long! We even run the risk of hitting the maximum index name length! We don't expect this UUID to clash that frequently, perhaps we can use a subset of the UUID, like 4 characters or so, as the UUID? Especially because if we do have a clash, then we'll ERROR and retry again with a different UUID next time.

SHRUNKEN_INDEX_PREFIX);
ShrunkShardsAllocatedStep allocated = new ShrunkShardsAllocatedStep(enoughShardsKey, copyMetadataKey, SHRUNKEN_INDEX_PREFIX);
ClusterStateWaitUntilThresholdStep checkShrinkReadyStep = new ClusterStateWaitUntilThresholdStep(
new CheckShrinkReadyStep(allocationRoutedKey, shrinkKey), cleanupShrinkIndexKey);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only needs to rewind back to setSingleNodeKey instead of cleanupShrinkIndexKey, right? Because this is the pre-step when the shrunken index hasn't actually been created yet.

We only want to rewind and pick a new node if we can't allocate within the time frame, no need to delete anything.

@@ -177,14 +183,20 @@ public boolean isSafeAction() {
CheckNotDataStreamWriteIndexStep checkNotWriteIndexStep = new CheckNotDataStreamWriteIndexStep(checkNotWriteIndex,
waitForNoFollowerStepKey);
WaitForNoFollowersStep waitForNoFollowersStep = new WaitForNoFollowersStep(waitForNoFollowerStepKey, readOnlyKey, client);
UpdateSettingsStep readOnlyStep = new UpdateSettingsStep(readOnlyKey, setSingleNodeKey, client, readOnlySettings);
UpdateSettingsStep readOnlyStep = new UpdateSettingsStep(readOnlyKey, cleanupShrinkIndexKey, client, readOnlySettings);
CleanupShrinkIndexStep cleanupShrinkIndexStep = new CleanupShrinkIndexStep(cleanupShrinkIndexKey, generateShrinkIndexNameKey,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment the steps in this method so someone that comes in can follow the execution flow? I think it'd be helpful for others reading the code.

Comment on lines 56 to 62
LifecycleExecutionState lifecycleState = LifecycleExecutionState.fromIndexMetadata(indexMetadata);
String shrunkenIndexName = lifecycleState.getShrinkIndexName();
if (shrunkenIndexName == null) {
// this is for BWC reasons for polices that are in the middle of executing the shrink action when the update to generated
// names happens
shrunkenIndexName = shrunkIndexPrefix + index.getName();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we duplicate this enough places that maybe we want to put this in a static method somewhere, passing in the prefix and IndexMetadata, what do you think?

Comment on lines 654 to 655
// Sometimes throw in an extraneous unfollow just to check it doesn't break anything
if (randomBoolean()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor Author

@andreidan andreidan Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I was suspecting this could cause some flakiness, but I believe it might've been due to dev issues as I was working through this feature (saving the execution state - namely the shrink index name being present in the execution state when we reach the cold phase). I'll add this back. Thanks!

// elapsed ILM will likely stop waiting and go to the next step.
// Also see {@link org.elasticsearch.xpack.core.ilm.ClusterStateWaitUntilThresholdStep}
public static final Setting<TimeValue> LIFECYCLE_STEP_WAIT_TIME_THRESHOLD_SETTING =
Setting.positiveTimeSetting(LIFECYCLE_STEP_WAIT_TIME_THRESHOLD, TimeValue.timeValueHours(12), Setting.Property.Dynamic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enforce a minimum setting? It would be bad to be spinning forever because someone put 1 second as their wait time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Shall we set it to one hour minimum? (this will make it impossible to integration test - https://github.com/elastic/elasticsearch/pull/70107/files/db8a13789e40a0b7f64263a0c88a257b21eb206c#diff-00728663bc72dd25e9832acf55fd6c4dd690a34d3c9ef6e5408464d81ff1f785R309 - however, maybe the unit tests we have around the ClusterStateWaitUntilThresholdStep are enough)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to integration test the rewind even with a 1h minimum value in 73e363d

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@jrodewig thanks so much for restructuring and rewording the docs

@andreidan
Copy link
Contributor Author

if manually moving an index that is past a shrink step back to any point before the shrink, the cleanup step actually ends up deleting the index itself

@dakrone ah interesting. So this happened because the alias/identity of the source index was swapped to the shrink index (and the source index was deleted in the process). I pushed 4b091e6 to skip the cleanup index step if the managed index is a shrunk index and the source index does not exist anymore

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan requested a review from dakrone March 17, 2021 12:29
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating on this Andrei!

@andreidan andreidan merged commit 9831084 into elastic:master Mar 18, 2021
andreidan added a commit to andreidan/elasticsearch that referenced this pull request Mar 18, 2021
This aims at making the shrink action retryable. Every step is
retryable, but in order to provide an experience where ILM tries
to achieve a successful shrink even when the target node goes
missing permanently or the shrunk index cannot recover, this also
introduces a retryable shrink cycle within the shrink action.

The shrink action will generate a unique index name that'll be the
shrunk index name. The generated index name is stored in the lifecycle
state.

If the shrink action ends up waiting for the source shards to
colocate or for the shrunk index to recover for more than the configured
`LIFECYCLE_STEP_WAIT_TIME_THRESHOLD` setting, it will move back
to clean up the attempted (and failed) shrunk index and will retry
generating a new index name and attempting to shrink the source
to the newly generated index name.

(cherry picked from commit 9831084)
Signed-off-by: Andrei Dan <[email protected]>
andreidan added a commit to andreidan/elasticsearch that referenced this pull request Mar 18, 2021
andreidan added a commit that referenced this pull request Mar 18, 2021
This aims at making the shrink action retryable. Every step is
retryable, but in order to provide an experience where ILM tries
to achieve a successful shrink even when the target node goes
missing permanently or the shrunk index cannot recover, this also
introduces a retryable shrink cycle within the shrink action.

The shrink action will generate a unique index name that'll be the
shrunk index name. The generated index name is stored in the lifecycle
state.

If the shrink action ends up waiting for the source shards to
colocate or for the shrunk index to recover for more than the configured
`LIFECYCLE_STEP_WAIT_TIME_THRESHOLD` setting, it will move back
to clean up the attempted (and failed) shrunk index and will retry
generating a new index name and attempting to shrink the source
to the newly generated index name.

(cherry picked from commit 9831084)
Signed-off-by: Andrei Dan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >feature Team:Data Management Meta label for data/management team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants