-
Notifications
You must be signed in to change notification settings - Fork 25.2k
index rollover running with "NORMAL" priority #50778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-core-features (:Core/Features/ILM+SLM) |
I do not think this should be an Note that we split up the historically-expensive |
We have seen an issue in larger clusters where it often takes longer than the 30s default to process a rollover, which in currently-released versions of ES will cause ILM to stop for an index until a user intervenes. This is definitely a problem as you say, because it can lead to indices growing very large. However, we're already taking steps to address this problem in another way. In #50388 we've made Rollover a single cluster state update (rather than several in sequence), which enables us to implement automatic retries (see #48183) for rollover, and should help alleviate this problem without having to adjust the priority of the task. |
Thanks @DaveCTurner and @gwbrown for your response.
Like you mentioned reroute is an "expensive" NORMAL task, based on insertion order it can still delay the rollover task which will timeout after default 30secs waiting in the queue. With "NORMAL" priority it is competing not only with URGENT/ HIGH but with "NORMAL" priority tasks as well. And, rollover operation not performed in time could cause single index to grow for high ingestion rate and can have more side effects in the cluster. individual create-index/ alias-switch tasks run with "URGENT" priority but why rollover should have lower priority than that (considering all are customer initiated actions). I would like to understand the criteria based on which priority is decided for various tasks.
yes, thanks for the fix of rollover in a single cluster state update. This is really helpful. |
We discussed this today and decided that we'd rather not increase the priority of this task, instead, we have other ways to address it:
As David said, we should solve the underlying issue rather than increasing the priority, as once more things move to With that I'm going to close this issue. Thanks! |
Elasticsearch version (
bin/elasticsearch --version
): ES master branchDescription of the problem including expected versus actual behavior:
Index rollover cluster state update is running with "NORMAL" priority after this PR to make rollover execute in one cluster state update (thanks for fixing it). Before this change, the two steps 1) create index 2) alias-switch both used to run with "URGENT" priority. This can delay the rollover task and could again cause single index to grow huge if master is busy with other higher priority tasks like shard-started, shard-failed, update snapshot state etc.
Fix: Update the priority for rollover task to "URGENT"
#50388
The text was updated successfully, but these errors were encountered: