Skip to content

Force merge should optionally honour index.merge.policy.max_merged_segment #61764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
craigglennie opened this issue Sep 1, 2020 · 6 comments · Fixed by #77478
Closed

Force merge should optionally honour index.merge.policy.max_merged_segment #61764

craigglennie opened this issue Sep 1, 2020 · 6 comments · Fixed by #77478
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@craigglennie
Copy link

craigglennie commented Sep 1, 2020

We would like to be able to reduce the number of deleted documents in our indexes, as we are spending a lot of money on disk space for deleted documents that are not reclaimed.

In our use case all of our indexes are permanently "hot" (to use ILM terminology). We have large volumes of continuous updates and deletes (sometimes almost all docs in an index will need to be updated multiple times), and we end up with lots of deleted documents. We have set deletes_pct_allowed, which works, but won't go below 20% (I believe this is a Lucene limit). When you have a 25TB index with 20% deleted docs that's 5TB of essentially wasted disk space that we can't reclaim without reindexing. We don't want to have to reindex repeatedly, it requires extra effort from our Ops team. Given that all of our indexes are sitting at 20% deleted documents, you can see how this is a significant monetary expense for us.

We'd like some way to get the deleted document count in an index down to 0 (or at least a lot lower than 20%) but we don't want to use force merge because of the current catch that you can end up with segments greater than 5GB. Force merge seems optimised for the use case of one-off merges down to a single segment, but I think because our indexes are always hot we don't want to merge down to one segment. We'd prefer to be able to set deletes_pct_allowed to be lower, but would settle for being able to expunge deletes occasionally, and otherwise let normal merging go about its business.

My proposal is that force merge (or just the expunge deletes function) be modified so that it optionally respects the max segment size limit, which was implemented by Lucene. Currently EsTieredMergePolicy overrides the normal Lucene TieredMergePolicy such that max segment size is not respected, and there doesn't seem to be a way around that for users.

There's a related issue here but I wanted to propose a specific change (maybe it's not the right change!) and also explain how this affects us.

@craigglennie craigglennie added >enhancement needs:triage Requires assignment of a team area label labels Sep 1, 2020
@jloleysens jloleysens added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Sep 1, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Sep 1, 2020
@andreidan andreidan added team-discuss and removed needs:triage Requires assignment of a team area label labels Sep 1, 2020
@jpountz
Copy link
Contributor

jpountz commented Sep 24, 2020

I fear that 20% of storage capacity is the price to pay in order to have read-write indexes. Lower values would translate into high write amplification. To give an extreme example, if you were to set it to 0%, every update would trigger the rewrite of a 5GB segment.

That said it does feel reasonable to me to make _forcemerge honor the maximum segment size as long as max_num_segments is not set.

@craigglennie
Copy link
Author

craigglennie commented Sep 25, 2020

Thanks for taking the time to reply 🙂

I fear that 20% of storage capacity is the price to pay in order to have read-write indexes. Lower values would translate into high write amplification. To give an extreme example, if you were to set it to 0%, every update would trigger the rewrite of a 5GB segment.

For us it would be nice if more leeway was given to decide for ourselves what degree of write amplification / IO penalty we're willing to accept. We spend enough money on ES that we also spend a lot of time optimising performance and usage, and decreasing the space lost to deleted docs could represent a fair amount of money - even if it's only a few percent. I wouldn't argue for no lower limit - your point about the behaviour at 0% is taken 😄 - but some lowering is still attractive (unless it's the case that everything always goes to hell at 19%, in which case sure, 20% would seem like a good limit to keep 😆 )

I'd also note that if the concern is that users will make unwise choices: there's already an ES precedent there, in the current force merge behaviour. That's been handled with a warning in the docs for some time now, so it wouldn't be precedent-setting to allow a lower deletion percentage limit and warn users about the possible implications of going below 20%.

Having said all that, my understanding is that these limits are actually in Lucene, so any change would have to be made there?

That said it does feel reasonable to me to make _forcemerge honor the maximum segment size as long as max_num_segments is not set.

That's encouraging to hear - I'll keep my eye out for any updates on that behaviour 😃

@jpountz
Copy link
Contributor

jpountz commented Oct 5, 2020

We just discussed this issue and agreed to move forward with honoring the maximum segment size as long as the maximum number of segments is not specified. This means that calling _forcemerge?only_expunge_deletes=true would then honor the maximum segment size.

@jpountz jpountz removed the discuss label Oct 5, 2020
@dakrone dakrone added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Dec 9, 2020
@elasticmachine elasticmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. and removed Team:Data Management Meta label for data/management team labels Dec 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

jimczi added a commit that referenced this issue Sep 14, 2021
…rge (#77478)

This commit changes the es merge policy to apply the maximum segment size
on force merges that only expunge deletes (forceMergeDeletes).
This option is useful for read-write use cases that wants to reclaim deleted docs
more aggressively than the `index.merge.policy.deletes_pct_allowed`.

Closes #61764
Relates #77270
jimczi added a commit that referenced this issue Sep 14, 2021
…rge (#77478) (#77692)

This commit changes the es merge policy to apply the maximum segment size
on force merges that only expunge deletes (forceMergeDeletes).
This option is useful for read-write use cases that wants to reclaim deleted docs
more aggressively than the `index.merge.policy.deletes_pct_allowed`.

Closes #61764
Relates #77270
@craigglennie
Copy link
Author

Thank you! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants