Skip to content

Slow bulk updates #23792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
redserpent7 opened this issue Mar 29, 2017 · 16 comments
Closed

Slow bulk updates #23792

redserpent7 opened this issue Mar 29, 2017 · 16 comments

Comments

@redserpent7
Copy link

redserpent7 commented Mar 29, 2017

Hi,

Recently I encountered an issue after migrating my cluster from 1.7 to 5.2. I have experienced extremely slow bulk updates to the extent where it took 1 week to update 1M files where in 1.7 those took minutes.

I have reported this on Elastic Discuss. And did not get any valid answer for why I am experiencing this kinf of slowness.

As you can see from the discussion thread, I have tried many things. Set number of replicas to 0, set refresh interval to -1, increased the CPU and memory of my nodes, increased the size of the cluster from 5 to 9 nodes and since the cluster was on AWS EC2 I changed the disk type to provisioned IOPS with 10000 IOPS. I also tried changing the bulk size from 100 to 500 to 1000 to 10000 and every increase made matters worse.

All of the above did not show any difference in the bulk update speed. Bulk inserts only took milliseconds so did update by query and search.

I was then forced to switch the code to pull the document, update the necessary fields on the server, then insert the updated document to ES. This sped up the process significantly as I managed to update 1M documents in 10 minutes. And I can increase this rate further as I did not see any changes to the CPU or Memory usage during those 10 minutes.

All this led me to believe that there is definitely some major flaws of the way ES handles bulk updates since the slowness did not make any sense and since doing the GET/UPDATE/INSERT should, in theory, take longer than using the update API.

es_monitor

Looking at the attached image, I do not see any significant changes in the indexing and search rates since I stopped using bulk updates which was on Mar 26th, as indicated by the red line.

I wish you can review the way bulk updates are currently being handled because even though I managed to avert that disaster I believe it should not be this way at all and it should be much faster than what I was experiencing.

Currenlty my ES cluster is on version 5.2
JVM: Java SE 1.8.0_121
OS: Ubuntu 14.04 64-bit
Installed Plugins: analysis-icu, analysis-kuromoji, analysis-smartcn, discovery-ec2, repository-s3, x-pack

@jasontedor
Copy link
Member

Let's keep the discussion on the forum as cross-posting just fractures discussions making them harder to follow. We can reopen this if there is a verified bug.

@redserpent7
Copy link
Author

@jasontedor If you'd like to. However I strongly believe this is a major bug with ES.

@dizzzyroma
Copy link

The same problem. Add comment to forum.

@dmarkhas
Copy link

@jasontedor there are multiple reports regarding this issue (the original thread mentioned here, as well as this one and this one, which is actually reproducing this behavior on Elastic Cloud), and there isn't much traction on the forum discussions which are being closed due to inactivity.

Can you advise how we can get this looked at? Is there any additional information we (users who are running into this issue) can collect and provide?

@redserpent7
Copy link
Author

@jasontedor its kind of alarming that no one from Elastic is considering this an issue. We are not talking about milliseconds delay differences been ES 5 and older versions. We are taking in minutes here, and in my case it was hours.

@jasontedor
Copy link
Member

jasontedor commented Apr 21, 2017

I think what you're all seeing is due to the fact that starting in 5.0.0 a refresh is forced if a get is performed since the last update to the document but the update has not been made visible to search yet. During an update request, a get is issued as part of executing the update operation. Therefore, this obviously has an impact on performing frequent updates to the same document. This is documented in the migration docs. I would encourage you to batch update operations to the same document on the client side.

@redserpent7
Copy link
Author

@jasontedor unless i am missing something, I do not see how this applies to my case in particular where only a single update is being issued to a document. And even then, refresh is relatively fast, only a couple of seconds on my clusterre, which does not explain why it would take a full hour to update 100 documents in bulk and a several milliseconds when pulling the document to another server updating some field values and re-inserting it and this also counts network transfer and latency times.

@redserpent7
Copy link
Author

@jasontedor and BTW my current resolution to the bulk update delays included pulling the document using the get API which according to the document you provided will issue a refresh on the cluster the same way bulk updates work. So i do not see why bulk updates would get delayed for several seconds while using get/insert takes no time at all

@jasontedor
Copy link
Member

I see three distinct reports here (please forgive me if I'm missing any):

In the last two, it is clear that the issue is exactly what I mentioned: forced refresh. The users are updating the same document ID, and in your thread one of the same users (@dizzzyroma) provided a hot threads output that clearly shows refresh is the cause. I consider those two resolved.

For your issue, it appears that might indeed not be the case. You say that you're not updating the same ID, and your monitoring charts do not show the number of segments increasing rapidly. I asked you for hot threads or profiler output to further triage. Without that, it will be difficult for us to assess what is going on.

@dmarkhas
Copy link

@jasontedor , based on the migration docs, this behavior for the GET API can be disabled by passing realtime=false. Is it possible to implement this for the update API?

@clintongormley
Copy link
Contributor

@jasontedor , based on the migration docs, this behavior for the GET API can be disabled by passing realtime=false. Is it possible to implement this for the update API?

No, because updates need to be sure they have the latest version of the doc to avoid losing changes

@jrots
Copy link

jrots commented Sep 18, 2017

Semi related but seeing very slow bulk API update requests on 6.0... openend a topic here :
https://discuss.elastic.co/t/slow-bulk-api-requests-es-6-0-beta2/100859
but really getting ridiculously slow update speeds..

ES is constantly busy with "warmer" requests when looking at hot_threads.. this is a blocking thing to use it in production for me.

@ctrix
Copy link

ctrix commented Mar 24, 2018

Hi all,

i'm runnning into the same issue with ES 6.2.3.

I'm using the UPDATE API to insert document into an index that i would like to rotate weekly.
The cluster is made of 8 blades.

The data flow in from a kafka queue which contains daily deduped data out of which i craft the document to be sent to ES.
During the first day, that is when the documents are mainly unique, i have brilliant performance, around 10K updates per second, using batches of 5000 documents and a few threads, i could probably push my hardware to bigger numbers but i don't need it.

After 24h, when some of the documents with the same ID comes in, the performance drops miserably to 500 documents per second, even less, then the CPU skyrockets and iowait sucks most of my resources. I tried changing the batch size, the threads, but no solution found.

I wanted to underline that you don't need duplicate documents in the same bulk request to trigger this ugly performance issue. You just need to update a few existing documents per request to make ES useless and unsuitable.

I hope that someone will look into this issue sooner or later ...

@bleskes
Copy link
Contributor

bleskes commented Mar 25, 2018

@ctrix there are potentially some things you can do to sidestep the issue (at least it sounds like that from your description). If you open up a thread on discuss.elastic.co we can try to figure it out with you there.

@ctrix
Copy link

ctrix commented Mar 25, 2018

@bleskes thanks for the reply.
For the records, i've posted my problem here where i hope to have a few helpful follow ups.

@s1monw
Copy link
Contributor

s1monw commented Mar 28, 2018

FYI #29264 might resolve this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants