-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Throttling incoming indexing when Lucene merges fall behind #6066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It looks like Simon's prototype pauses the indexing thread if too many merges are in flight. I'm not 100% clear on the code path that gets here. Will that pause indexing or pause refreshing or both? It'd be neat to slow down just the refreshing and let indexing be slowed down by the refresh backlog logic. Or am I crazy? |
@nik9000 internally the IndexWriter has several threads states (8 by default) that we index into. If we limit to a single threads we only use on of the states and make sure we max out the RAM buffer and write the least amount of segments. This means we 1. reduce the number of segments to merge and 2. make sure flushes are only done if really needed. I think we can't slow down refreshes otherwise folks will see odd results since you don't get new documents. You also want to refresh to publish merged segments to further reduce the number of segments. We will do the right thing and provide backpressure on indexing not on refresh. Hope that makes sense? |
I'd honestly forgotten about flushes. Its what I get for only playing on the other side. Anyway, I'm happy so long as back pressure is provided on indexing. |
I tested the current throttling branch with the refresh=-1 case, and we have problems because the "abandoned" thread states will never flush until a full flush ... workaround is you must use a refresh to get them flushed. |
I'm inclined to simply document that index throttling won't kick in if you use SerialMergeScheduler. SMS only allows one merge to run at a time, so apps that are doing heavy bulk indexing really should not be using it. |
OK I reviewed these changes with Simon. We decided we don't need to add a separate "kill switch" for this because you can just set max_merge_count higher to avoid throttling. But we also decided not to document this new setting on the index-modules-merges docs: it's a very advanced setting, and playing with it could easily mess up merges. |
This commit upgrades to the latest Lucene 4.8.1 release including the following bugfixes: * An IndexThrottle now kicks in when merges start falling behind limiting index threads to 1 until merges caught up. Closes #6066 * RateLimiter now kicks in at the configured rate where previously the limiter was limiting at ~8MB/sec almost all the time. Closes #6018
This commit upgrades to the latest Lucene 4.8.1 release including the following bugfixes: * An IndexThrottle now kicks in when merges start falling behind limiting index threads to 1 until merges caught up. Closes #6066 * RateLimiter now kicks in at the configured rate where previously the limiter was limiting at ~8MB/sec almost all the time. Closes #6018
This commit upgrades to the latest Lucene 4.8.1 release including the following bugfixes: * An IndexThrottle now kicks in when merges start falling behind limiting index threads to 1 until merges caught up. Closes #6066 * RateLimiter now kicks in at the configured rate where previously the limiter was limiting at ~8MB/sec almost all the time. Closes #6018
Guys I don't think this works as expected, I'm getting :
5 times a second right at the beginning of bulk indexing. I'm disabling throttling and refresh interval, I start with |
@l15k4 your merges are not keeping up. See https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html#scheduling |
@clintongormley but they are not not keeping up right at the moment of starting indexing intto a a small (1M records) optimized index... increasing merge thread pool doesn't help... I have 4 Imho I need to scale it up just because of segment merging, but there will be plenty of unused resources ... I'm trying to solve this issue for months now... |
@l15k4 did you disable store IO throttling (defaults to 20 MB/sec, which is too low for heavy indexing cases). Where are you storing the shards (what IO devices), EBS or local instance storage? Also try the ideas here: https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing |
@mikemccand I set it up to 30, 40, 80, 100 MB/s ... it had no effect. I also tried to set We use EBS (General Purpose (SSD)) on It seems that if you are doing bulk indexing and have all fields It always looks this way, it is throttling for some period of time like 15-20 minutes and then it stops http://i.imgur.com/UyDTlHi.png I also tried to shrink |
The best bulk indexing performance I can get on a machine with 4 hyper threads and EBS (750 Mbps) with all fields being doc_values is by increasing I think that after doc_values people don't have much of a choice, they'll need physically attached SSD... |
Hmm enabling doc values is typically a minor indexing performance hit in my experience, e.g. see the nightly benchmarks at https://benchmarks.elastic.co (annotation R on the first chart). Do you have provisioned IOPs for your EBS mounts? Are you sure you're not running into that limit? Can you try the local instance SSD, just for comparison? Your EBS is backed by SSD as well, so this would let us remove EBS from the equation. (You'd need to switch to an i2.4xlarge instance for this test). |
General Purpose unfortunately, the price of IO Provisioned SSDs surprised us. If you want to go beyond 160 MiB/s to 320 MiB/s it costs double than the volume itself. I guess it wouldn't throttle with IO Provisioned SSD with 9000 IOPS to reach those 320MiB/s ... but these machines cost fortune :-) |
Or just use the local instance attached SSDs on the i2.* instance types ... |
Lucene has low-level protection that blocks incoming segment-producing threads (indexing threads, NRT reopen threads, commit, etc.) when there are too many merges running.
But this is too harsh for Elasticsearch, so it's entirely disabled, but this means merges can fall far behind under heavy indexing, and this results in too many segments in the index, which causes all sorts of problems (slow version lookups, too much RAM, etc.).
So we need to do something "softer"; Simon has a good starting patch, which I tested and confirmed (after https://issues.apache.org/jira/browse/LUCENE-5644 is fixed) at least in one use-case that it prevents too many segments in the index:
Before Simon's + Lucene's fix: http://people.apache.org/~mikemccand/lucenebench/base.html
Same test with the fix: http://people.apache.org/~mikemccand/lucenebench/throttled.html
Segment counts stay essentially flat.
Here's Simon's prototype patch: s1monw@2de96f9
The text was updated successfully, but these errors were encountered: