-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add a min_docs parameter for rollover action in ILM #45900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-core-features |
I've changed the tag on this from ILM to Indices APIs, as most of this change would be to the Rollover API - the only ILM changes needed would be to make sure the parameter is passed through to the Rollover API. |
We discussed this in the Core/Features meeting today - here's the results of that discussion: As for the situation you've described in the Discuss post linked above:
We have a planned improvement to ILM to allow manually rolling over indices without causing errors in ILM (#44175). Once we've implemented this, when you've determined an index pattern is "complete", you could manually roll over the index, which would allow that index to continue through its lifecycle, then manually delete the new, empty index. As for the concerns about the same policy being applied to other indices, we recommend creating separate policies if you have indices with different requirements - ILM policies are relatively cheap, so you shouldn't worry about having multiple policies. During the discussion, a more generally applicable use case came up: ILM-managed indices with I'm going to close this issue, as we do not intend to implement this feature at the moment. However, if anyone finds this and has this or another use case that this feature would address, please comment here. |
Just a slightly different use case for ingest failures:
So this might eventually cause issues in a long-running cluster with a very short Beside that it's probably more an cosmetic issue... |
+1 on this one, there are cases when data streams become obsolete (with no new data) but should not be deleted (keep in warm/cold/frozen state forever); this leads to infinite new (empty) indices in the data stream. |
Hi,
This feature request is related to this forum thread: https://discuss.elastic.co/t/what-about-a-min-docs-parameter-for-rollover-action-in-index-lifecycle-management/195242
This issue precises further the request and the use cases discussed in the forum.
The rollover action currently allows for triggering a new index using 3 parameters:
Use cases
The Rollover action currently is really useful in time series indices, but can also be used for indices (named below datastore indices) which are not time series (by avoiding the
max_age
condition).For both contexts there are cases for which we can end with lots of empty or very small indices when using the
max_age
condition, if the indexing throughput is low or null during some period of time. This can lead to a very big number of shards in the cluster and makes potentially its stability at risk.The
delete
action could be used to balance this risk, but there are contexts where the index deletion is not related to the index lifecycle directly, and could be related to an external condition (unrelated to the index age nor its size). Hence, we should be able to use safely the rollover action without the need to implement a delete action.Time series indices
In the case of time series indices, we can have some period of very low throughput, or even no throughput at all, so using
max_age
makes a new index created each hour (in the example below), even if no documents have been added in the given period, leading to a series of useless very small or empty indices.Datastore indices
During an heavy indexing load, we are able to use the ILM policy, in order to keep index size in safe limits, automatically. We can do this by using the
max_docs
ormax_size
condition to trigger the rollover. But for the last X documents (X being lower thanmax_docs
and having size lower thanmax_size
), they will stay indefinitely in the hot phase, even if we know the indexing process is finished, so they cannot benefit of the next cleaning processes (such as shrink, forcemerge and so on) in the warm phase of the defined policy.Then, if we use the
max_age
condition to prevent the last indexed documents from staying in the hot index, the number of empty indices will increase over time, unnecessary, after the indexing process is finished.Feature request
To prevent the creation of any empty (or too small) index, we could add a
min_docs
condition to be used in conjunction with the other conditions, as following:Example:
Constraints on the new condition
min_docs
<max_docs
indices.lifecycle.poll_interval
), this one should be considered in conjunction with the others and could not be used alone.min_docs
as a unique condition should be invalid and throw an exception.API definition
To reflect this requirement, we could change the way the conditions are declared in several manners, in addition to the example above:
This way is clearer in my mind to explicit how the triggers are met, but it has the cons to change the API.
To keep compatibility, we could move to something like:
Using an explicit
and_min_docs
condition that could not be met alone, without at least one other condition.The text was updated successfully, but these errors were encountered: