-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Serialize Monitoring Bulk Request Compressed #56410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize Monitoring Bulk Request Compressed #56410
Conversation
Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the http connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
Pinging @elastic/es-core-features (:Core/Features/Monitoring) |
@original-brownbear the changes here look fine technically, however, I wonder about the practicality. Despite bulk and flush separated by methods, it actually does very little between these two methods. This particular workflow does not buffer bulk payloads and flush based on time or size (like other parts of the code base). So here we will compress only to de-compress a few milliseconds later. I agree the goal with the reduce the memory usage of the bulk exporters on master, but I am not sure if this buys us much without the ability to send the compressed bits. I wonder if gzip compression the payload for use with the http entity such that we don't need to decompress prior to calling out is feasible ? |
That still saves us a boatload of memory I thnk. We're not decompressing to buffers but decompressing to a stream that is then consumed step by step by the HTTP client. So at no point will we need the full bulk request on heap uncompressed and peak memory use is way reduced isn't it? Also, even if it did get buffered in full in the client (but I think the Apache HTTP client is way smarter than that ... looks that way from the code as well) we'd still save an order of magnitude on half the request size :) (looking at heap dumps of clusters with trouble I can't see any large buffers than the ones we create as well though, so I'm pretty confident the client streams properly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I trust your assessment of the practicality, and no issues with the implementation.
Thanks Jake! |
Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive
reduction in size when considering O(100MB) bulk requests.