-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Save Memory on Large Repository Metadata Blob Writes #74313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save Memory on Large Repository Metadata Blob Writes #74313
Conversation
boolean atomic, | ||
CheckedConsumer<OutputStream, IOException> writer | ||
) throws IOException { | ||
// TODO: this is just a stop-gap solution for until we have an encrypted output stream wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how work on the encrypted repo has stalled yet again at this point, I don't think it's worth it investing a lot of time here to code up an output stream implementation as well. We can do that (and it shouldn't be too hard) when the time comes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should create an issue to keep track of this?
@@ -2248,13 +2245,11 @@ public void onFailure(Exception e) { | |||
} | |||
final String indexBlob = INDEX_FILE_PREFIX + Long.toString(newGen); | |||
logger.debug("Repository [{}] writing new index generational blob [{}]", metadata.name(), indexBlob); | |||
try (ReleasableBytesStreamOutput out = new ReleasableBytesStreamOutput(bigArrays)) { | |||
writeAtomic(blobContainer(), indexBlob, out -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we do streaming reads of RepositoryData
already, adding streaming writes here removes the 2G
size limit on it (though the most important part of this is that we save ourselves the trouble of materializing the bytes on heap for large repos where this has put a lot of stress on master's heap often).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great
Extracted the chunked output stream logic from elastic#74313 and added tests for it to make it easier to review.
Extracted the chunked output stream logic from #74313 and added tests for it to make it easier to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Armin!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🤞🏻
...pository-s3/src/test/java/org/elasticsearch/repositories/s3/S3BlobContainerRetriesTests.java
Outdated
Show resolved
Hide resolved
@@ -2248,13 +2245,11 @@ public void onFailure(Exception e) { | |||
} | |||
final String indexBlob = INDEX_FILE_PREFIX + Long.toString(newGen); | |||
logger.debug("Repository [{}] writing new index generational blob [{}]", metadata.name(), indexBlob); | |||
try (ReleasableBytesStreamOutput out = new ReleasableBytesStreamOutput(bigArrays)) { | |||
writeAtomic(blobContainer(), indexBlob, out -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great
server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
Outdated
Show resolved
Hide resolved
Thanks Francisco and Tanguy! |
Extracted the chunked output stream logic from elastic#74313 and added tests for it to make it easier to review.
This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing. The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large `RepositoryData` (and limiting us to `2G` max blob size). With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation. As we move to larger repositories this makes master node stability a lot more predictable since writing out `RepositoryData` does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on `RepositoryData`.
This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing. The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large RepositoryData (and limiting us to 2G max blob size). With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation. As we move to larger repositories this makes master node stability a lot more predictable since writing out RepositoryData does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on RepositoryData. backport of #74313 and #74620
See elastic#53119 for more context about why those tests are muted on JDK8. They start failing more often recently now elastic#74313 and elastic#74620 have been merged, as reported in elastic#74739.
@original-brownbear , do you know if this PR should change in any way our statement in the docs saying that |
Just for completeness sake here: @eedugon as discussed on another channel, this is waiting for a docs update only. The |
This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing.
The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large
RepositoryData
(and limiting us to2G
max blob size).With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation.
As we move to larger repositories this makes master node stability a lot more predictable since writing out
RepositoryData
does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit onRepositoryData
.