-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ILM: Make all the shrink action steps retryable #70107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ILM: Make all the shrink action steps retryable #70107
Conversation
This aims at making the shrink action retryable. Every step is retryably, but in order to provide an experience where ILM tries to achieve a successful shrink even when the target node goes missing permanently or the shrunk index cannot recover, this also introduces a retryable shrink cycle within the shrink action. The shrink action will generate a unique index name that'll be the shrunk index name. The generated index name is stored in the lifecycle state. If the shrink action ends up waiting for the source shards to colocate or for the shrunk index to recover for more than the configured `LIFECYCLE_STEP_WAIT_TIME_THRESHOLD` setting, it will move back to cleanup the attempted (and failed) shrunk index and will retry generating a new index name and attempting to shrink the source to the newly generated index name.
@elasticmachine update branch |
We were setting the nextStepKey regardless if the underlying step execution was successful or not. We were also asserting we have a `nextStepKey` even if the step failed (and moved into the `ERROR` step). I don't think this is correct as from the `ERROR` step the only place ILM can move is back to the `failedStep` (if it was retryable).
Pinging @elastic/es-core-features (Team:Core/Features) |
// This setting configures how much time since step_time should ILM wait for a condition to be met. After the threshold wait time has | ||
// elapsed ILM will likely stop waiting and go to the next step. | ||
// Also see {@link org.elasticsearch.xpack.core.ilm.ClusterStateWaitUntilThresholdStep} | ||
public static final Setting<TimeValue> LIFECYCLE_STEP_WAIT_TIME_THRESHOLD_SETTING = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to document this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, even if only in the shrink action documentation.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Andrei! I left some comments for an initial review.
.../core/src/main/java/org/elasticsearch/xpack/core/ilm/ClusterStateWaitUntilThresholdStep.java
Outdated
Show resolved
Hide resolved
// wonderful thing) | ||
TimeValue retryThreshold = LifecycleSettings.LIFECYCLE_STEP_WAIT_TIME_THRESHOLD_SETTING.get(idxMeta.getSettings()); | ||
LifecycleExecutionState lifecycleState = fromIndexMetadata(idxMeta); | ||
if (waitedMoreThanThresholdLevel(retryThreshold, lifecycleState, Clock.systemUTC())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly concerned that using Clock.systemUTC()
here means that we'll be subject to drift as the time on the underlying machine changes, but to fix that we'd have to use System.nanoTime()
which would be a little strange since it's not necessarily "real" time.
(not really something we should change, I'm just voicing my concern)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're absolutely right - wall clocks are not trustworthy. I don't think nanoTime
would be the answer as the reading invariant on that api is to be monotonically increasing, more so than accuracy.
I don't think this is too bad because if we get a reading where the time drift is hours away we'll retry the shrink cycle (so nothing truly bad happens, except a wasteful - and presumbaly eventually successful - shrink operation, with the note that we'll at least delete the abandoned shrunk index)
@@ -38,7 +38,7 @@ | |||
|
|||
public static final String NAME = "generate-snapshot-name"; | |||
|
|||
private static final Logger logger = LogManager.getLogger(CreateSnapshotStep.class); | |||
private static final Logger logger = LogManager.getLogger(GenerateSnapshotNameStep.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch :)
String generatedIndexName = generateValidIndexName(prefix, index.getName()); | ||
ActionRequestValidationException validationException = validateGeneratedIndexName(generatedIndexName, clusterState); | ||
if (validationException != null) { | ||
logger.warn("unable to generate a valid shrink index name as part of policy [{}] for index [{}] due to [{}]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the term "shrink" in here? I was thinking we may want to end up re-using this for other purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good catch
static String generateValidIndexName(String prefix, String indexName) { | ||
return prefix + indexName + "-" + generateValidIndexSuffix(() -> UUIDs.randomBase64UUID().toLowerCase(Locale.ROOT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want a suffix on the index, it should be
<prefix>-<uuid>-<indexName>
The reason is that if we add any suffixes, then we break origination date parsing for anyone that has index.lifecycle.parse_origination_date
set for their indices.
I also think that we don't need a full UUID, which can be quite long! Imagine if we end up using this logic for all our prefixing steps and we'd end up with:
partial-jrx_xoiprjuncgvs5vmfmq-restored-krx_xoiprjuncgvs5vmfmq-shrink-lrx_xoiprjuncgvs5vmfmq-rollup-msy_xoiprjuncgvs5vmfmq-.ds-myindex-2021.03.11-000001
Which is super long! We even run the risk of hitting the maximum index name length! We don't expect this UUID to clash that frequently, perhaps we can use a subset of the UUID, like 4 characters or so, as the UUID? Especially because if we do have a clash, then we'll ERROR and retry again with a different UUID next time.
SHRUNKEN_INDEX_PREFIX); | ||
ShrunkShardsAllocatedStep allocated = new ShrunkShardsAllocatedStep(enoughShardsKey, copyMetadataKey, SHRUNKEN_INDEX_PREFIX); | ||
ClusterStateWaitUntilThresholdStep checkShrinkReadyStep = new ClusterStateWaitUntilThresholdStep( | ||
new CheckShrinkReadyStep(allocationRoutedKey, shrinkKey), cleanupShrinkIndexKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this only needs to rewind back to setSingleNodeKey
instead of cleanupShrinkIndexKey
, right? Because this is the pre-step when the shrunken index hasn't actually been created yet.
We only want to rewind and pick a new node if we can't allocate within the time frame, no need to delete anything.
@@ -177,14 +183,20 @@ public boolean isSafeAction() { | |||
CheckNotDataStreamWriteIndexStep checkNotWriteIndexStep = new CheckNotDataStreamWriteIndexStep(checkNotWriteIndex, | |||
waitForNoFollowerStepKey); | |||
WaitForNoFollowersStep waitForNoFollowersStep = new WaitForNoFollowersStep(waitForNoFollowerStepKey, readOnlyKey, client); | |||
UpdateSettingsStep readOnlyStep = new UpdateSettingsStep(readOnlyKey, setSingleNodeKey, client, readOnlySettings); | |||
UpdateSettingsStep readOnlyStep = new UpdateSettingsStep(readOnlyKey, cleanupShrinkIndexKey, client, readOnlySettings); | |||
CleanupShrinkIndexStep cleanupShrinkIndexStep = new CleanupShrinkIndexStep(cleanupShrinkIndexKey, generateShrinkIndexNameKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment the steps in this method so someone that comes in can follow the execution flow? I think it'd be helpful for others reading the code.
LifecycleExecutionState lifecycleState = LifecycleExecutionState.fromIndexMetadata(indexMetadata); | ||
String shrunkenIndexName = lifecycleState.getShrinkIndexName(); | ||
if (shrunkenIndexName == null) { | ||
// this is for BWC reasons for polices that are in the middle of executing the shrink action when the update to generated | ||
// names happens | ||
shrunkenIndexName = shrunkIndexPrefix + index.getName(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we duplicate this enough places that maybe we want to put this in a static method somewhere, passing in the prefix and IndexMetadata, what do you think?
// Sometimes throw in an extraneous unfollow just to check it doesn't break anything | ||
if (randomBoolean()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I was suspecting this could cause some flakiness, but I believe it might've been due to dev issues as I was working through this feature (saving the execution state - namely the shrink index name being present in the execution state when we reach the cold phase). I'll add this back. Thanks!
// elapsed ILM will likely stop waiting and go to the next step. | ||
// Also see {@link org.elasticsearch.xpack.core.ilm.ClusterStateWaitUntilThresholdStep} | ||
public static final Setting<TimeValue> LIFECYCLE_STEP_WAIT_TIME_THRESHOLD_SETTING = | ||
Setting.positiveTimeSetting(LIFECYCLE_STEP_WAIT_TIME_THRESHOLD, TimeValue.timeValueHours(12), Setting.Property.Dynamic, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we enforce a minimum setting? It would be bad to be spinning forever because someone put 1 second as their wait time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Shall we set it to one hour minimum? (this will make it impossible to integration test - https://github.com/elastic/elasticsearch/pull/70107/files/db8a13789e40a0b7f64263a0c88a257b21eb206c#diff-00728663bc72dd25e9832acf55fd6c4dd690a34d3c9ef6e5408464d81ff1f785R309 - however, maybe the unit tests we have around the ClusterStateWaitUntilThresholdStep
are enough)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to integration test the rewind even with a 1h
minimum value in 73e363d
@elasticmachine update branch |
Co-authored-by: Lee Hinman <[email protected]>
This changes the generated index name format to <prefix>-<uuid>-<indexname>, limitting the length of the uuid to a maximum of 4 chars.
@jrodewig thanks so much for restructuring and rewording the docs |
…d the source index does not exist anymore
@dakrone ah interesting. So this happened because the alias/identity of the source index was swapped to the shrink index (and the source index was deleted in the process). I pushed 4b091e6 to skip the cleanup index step if the managed index is a shrunk index and the source index does not exist anymore |
@elasticmachine update branch |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for iterating on this Andrei!
This aims at making the shrink action retryable. Every step is retryable, but in order to provide an experience where ILM tries to achieve a successful shrink even when the target node goes missing permanently or the shrunk index cannot recover, this also introduces a retryable shrink cycle within the shrink action. The shrink action will generate a unique index name that'll be the shrunk index name. The generated index name is stored in the lifecycle state. If the shrink action ends up waiting for the source shards to colocate or for the shrunk index to recover for more than the configured `LIFECYCLE_STEP_WAIT_TIME_THRESHOLD` setting, it will move back to clean up the attempted (and failed) shrunk index and will retry generating a new index name and attempting to shrink the source to the newly generated index name. (cherry picked from commit 9831084) Signed-off-by: Andrei Dan <[email protected]>
This aims at making the shrink action retryable. Every step is retryable, but in order to provide an experience where ILM tries to achieve a successful shrink even when the target node goes missing permanently or the shrunk index cannot recover, this also introduces a retryable shrink cycle within the shrink action. The shrink action will generate a unique index name that'll be the shrunk index name. The generated index name is stored in the lifecycle state. If the shrink action ends up waiting for the source shards to colocate or for the shrunk index to recover for more than the configured `LIFECYCLE_STEP_WAIT_TIME_THRESHOLD` setting, it will move back to clean up the attempted (and failed) shrunk index and will retry generating a new index name and attempting to shrink the source to the newly generated index name. (cherry picked from commit 9831084) Signed-off-by: Andrei Dan <[email protected]>
This aims at making the shrink action retryable. Every step is
retryable, but in order to provide an experience where ILM tries
to achieve a successful shrink even when the target node goes
missing permanently or the shrunk index cannot recover, this also
introduces a retryable shrink cycle within the shrink action.
The shrink action will generate a unique index name that'll be the
shrunk index name. The generated index name is stored in the lifecycle
state.
If the shrink action ends up waiting for the source shards to
collocate or for the shrunk index to recover for more than the configured
LIFECYCLE_STEP_WAIT_TIME_THRESHOLD
setting, it will move backto clean up the attempted (and failed) shrunk index and will retry
generating a new index name and attempting to shrink the source
to the newly generated index name.
Relates to #48183