-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Force merge should optionally honour index.merge.policy.max_merged_segment #61764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-core-features (:Core/Features/ILM+SLM) |
I fear that 20% of storage capacity is the price to pay in order to have read-write indexes. Lower values would translate into high write amplification. To give an extreme example, if you were to set it to 0%, every update would trigger the rewrite of a 5GB segment. That said it does feel reasonable to me to make |
Thanks for taking the time to reply 🙂
For us it would be nice if more leeway was given to decide for ourselves what degree of write amplification / IO penalty we're willing to accept. We spend enough money on ES that we also spend a lot of time optimising performance and usage, and decreasing the space lost to deleted docs could represent a fair amount of money - even if it's only a few percent. I wouldn't argue for no lower limit - your point about the behaviour at 0% is taken 😄 - but some lowering is still attractive (unless it's the case that everything always goes to hell at 19%, in which case sure, 20% would seem like a good limit to keep 😆 ) I'd also note that if the concern is that users will make unwise choices: there's already an ES precedent there, in the current force merge behaviour. That's been handled with a warning in the docs for some time now, so it wouldn't be precedent-setting to allow a lower deletion percentage limit and warn users about the possible implications of going below 20%. Having said all that, my understanding is that these limits are actually in Lucene, so any change would have to be made there?
That's encouraging to hear - I'll keep my eye out for any updates on that behaviour 😃 |
We just discussed this issue and agreed to move forward with honoring the maximum segment size as long as the maximum number of segments is not specified. This means that calling |
Pinging @elastic/es-distributed (Team:Distributed) |
…rge (#77478) This commit changes the es merge policy to apply the maximum segment size on force merges that only expunge deletes (forceMergeDeletes). This option is useful for read-write use cases that wants to reclaim deleted docs more aggressively than the `index.merge.policy.deletes_pct_allowed`. Closes #61764 Relates #77270
…rge (#77478) (#77692) This commit changes the es merge policy to apply the maximum segment size on force merges that only expunge deletes (forceMergeDeletes). This option is useful for read-write use cases that wants to reclaim deleted docs more aggressively than the `index.merge.policy.deletes_pct_allowed`. Closes #61764 Relates #77270
Thank you! 👏 |
Uh oh!
There was an error while loading. Please reload this page.
We would like to be able to reduce the number of deleted documents in our indexes, as we are spending a lot of money on disk space for deleted documents that are not reclaimed.
In our use case all of our indexes are permanently "hot" (to use ILM terminology). We have large volumes of continuous updates and deletes (sometimes almost all docs in an index will need to be updated multiple times), and we end up with lots of deleted documents. We have set
deletes_pct_allowed
, which works, but won't go below 20% (I believe this is a Lucene limit). When you have a 25TB index with 20% deleted docs that's 5TB of essentially wasted disk space that we can't reclaim without reindexing. We don't want to have to reindex repeatedly, it requires extra effort from our Ops team. Given that all of our indexes are sitting at 20% deleted documents, you can see how this is a significant monetary expense for us.We'd like some way to get the deleted document count in an index down to 0 (or at least a lot lower than 20%) but we don't want to use force merge because of the current catch that you can end up with segments greater than 5GB. Force merge seems optimised for the use case of one-off merges down to a single segment, but I think because our indexes are always hot we don't want to merge down to one segment. We'd prefer to be able to set
deletes_pct_allowed
to be lower, but would settle for being able to expunge deletes occasionally, and otherwise let normal merging go about its business.My proposal is that force merge (or just the expunge deletes function) be modified so that it optionally respects the max segment size limit, which was implemented by Lucene. Currently EsTieredMergePolicy overrides the normal Lucene TieredMergePolicy such that max segment size is not respected, and there doesn't seem to be a way around that for users.
There's a related issue here but I wanted to propose a specific change (maybe it's not the right change!) and also explain how this affects us.
The text was updated successfully, but these errors were encountered: