Save Memory on Large Repository Metadata Blob Writes #74313

original-brownbear · 2021-06-18T14:37:20Z

This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing.
The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large RepositoryData (and limiting us to 2G max blob size).
With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation.

As we move to larger repositories this makes master node stability a lot more predictable since writing out RepositoryData does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on RepositoryData.

…snapshot-info

…meta-blob-writes

original-brownbear · 2021-06-20T15:04:54Z

...ry-encrypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepository.java

+            boolean atomic,
+            CheckedConsumer<OutputStream, IOException> writer
+        ) throws IOException {
+            // TODO: this is just a stop-gap solution for until we have an encrypted output stream wrapper


Given how work on the encrypted repo has stalled yet again at this point, I don't think it's worth it investing a lot of time here to code up an output stream implementation as well. We can do that (and it shouldn't be too hard) when the time comes.

Maybe we should create an issue to keep track of this?

original-brownbear · 2021-06-20T15:09:25Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -2248,13 +2245,11 @@ public void onFailure(Exception e) {
            }
            final String indexBlob = INDEX_FILE_PREFIX + Long.toString(newGen);
            logger.debug("Repository [{}] writing new index generational blob [{}]", metadata.name(), indexBlob);
-            try (ReleasableBytesStreamOutput out = new ReleasableBytesStreamOutput(bigArrays)) {
+            writeAtomic(blobContainer(), indexBlob, out -> {


As we do streaming reads of RepositoryData already, adding streaming writes here removes the 2G size limit on it (though the most important part of this is that we save ourselves the trouble of materializing the bytes on heap for large repos where this has put a lot of stress on master's heap often).

That's great

Extracted the chunked output stream logic from elastic#74313 and added tests for it to make it easier to review.

Extracted the chunked output stream logic from #74313 and added tests for it to make it easier to review.

…meta-blob-writes

fcofdez

LGTM, thanks Armin!

tlrx

LGTM 🤞🏻

...pository-s3/src/test/java/org/elasticsearch/repositories/s3/S3BlobContainerRetriesTests.java

tlrx · 2021-06-29T06:50:33Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -2248,13 +2245,11 @@ public void onFailure(Exception e) {
            }
            final String indexBlob = INDEX_FILE_PREFIX + Long.toString(newGen);
            logger.debug("Repository [{}] writing new index generational blob [{}]", metadata.name(), indexBlob);
-            try (ReleasableBytesStreamOutput out = new ReleasableBytesStreamOutput(bigArrays)) {
+            writeAtomic(blobContainer(), indexBlob, out -> {


That's great

server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java

…meta-blob-writes

original-brownbear · 2021-06-29T09:29:37Z

Thanks Francisco and Tanguy!

Extracted the chunked output stream logic from elastic#74313 and added tests for it to make it easier to review.

This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing. The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large `RepositoryData` (and limiting us to `2G` max blob size). With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation. As we move to larger repositories this makes master node stability a lot more predictable since writing out `RepositoryData` does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on `RepositoryData`.

This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing. The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large RepositoryData (and limiting us to 2G max blob size). With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation. As we move to larger repositories this makes master node stability a lot more predictable since writing out RepositoryData does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on RepositoryData. backport of #74313 and #74620

See #53119 for more context about why those tests are muted on JDK8. They start failing more often recently now #74313 and #74620 have been merged, as reported in #74739.

See elastic#53119 for more context about why those tests are muted on JDK8. They start failing more often recently now elastic#74313 and elastic#74620 have been merged, as reported in elastic#74739.

See #53119 for more context about why those tests are muted on JDK8. They start failing more often recently now #74313 and #74620 have been merged, as reported in #74739. Co-authored-by: Tanguy Leroux <[email protected]>

eedugon · 2021-11-15T10:05:38Z

@original-brownbear , do you know if this PR should change in any way our statement in the docs saying that The maximum number of snapshots in a repository should not exceed 200?

original-brownbear · 2021-11-15T10:29:20Z

Just for completeness sake here: @eedugon as discussed on another channel, this is waiting for a docs update only. The 200 snapshots limit is outdated and the real limit in 7.14+ is much higher. I will update the docs next week.

original-brownbear added 15 commits June 16, 2021 12:24

works in core

318bacc

gcs support

998c369

hdfs support

67ad357

s3

d7c0223

mostly works

75fa696

works

cf4fa9e

bck

206ee12

Merge remote-tracking branch 'elastic/master' into efficient-storage-…

097f2b3

…snapshot-info

Merge remote-tracking branch 'elastic/master' into efficient-storage-…

8f70df4

…snapshot-info

fix cs

1f46a8b

drier

6d7a2df

nicer

0d67021

cleanup

3a1cc7d

stop materializing full messages

1c8084d

Merge remote-tracking branch 'elastic/master' into save-memory-large-…

613cdc5

…meta-blob-writes

original-brownbear added >enhancement WIP :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jun 18, 2021

original-brownbear added 10 commits June 18, 2021 17:25

docs

682c8ca

fix gcs

1d6ec23

Merge remote-tracking branch 'elastic/master' into save-memory-large-…

6c06d96

…meta-blob-writes

more docs

f98d10d

nicer

98b6119

docs and drier

6de3bc5

tests and fixes Azure

ced642a

drier

3d06059

Merge remote-tracking branch 'elastic/master' into save-memory-large-…

ad68ed8

…meta-blob-writes

docs

3af17d7

original-brownbear commented Jun 20, 2021

View reviewed changes

fixes

25eb3b7

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 28, 2021

Introduce ChunkedBlobOutputStream

3bb0e5a

Extracted the chunked output stream logic from elastic#74313 and added tests for it to make it easier to review.

original-brownbear mentioned this pull request Jun 28, 2021

Introduce ChunkedBlobOutputStream #74620

Merged

original-brownbear added a commit that referenced this pull request Jun 28, 2021

Introduce ChunkedBlobOutputStream (#74620)

f77f87e

Extracted the chunked output stream logic from #74313 and added tests for it to make it easier to review.

Merge remote-tracking branch 'elastic/master' into save-memory-large-…

ca32351

…meta-blob-writes

original-brownbear requested review from tlrx and fcofdez June 28, 2021 16:26

fcofdez approved these changes Jun 28, 2021

View reviewed changes

tlrx approved these changes Jun 29, 2021

View reviewed changes

original-brownbear added 2 commits June 29, 2021 10:16

Merge remote-tracking branch 'elastic/master' into save-memory-large-…

f50fb5f

…meta-blob-writes

CR: comments

4340b39

original-brownbear merged commit 8947c1e into elastic:master Jun 29, 2021

original-brownbear deleted the save-memory-large-meta-blob-writes branch June 29, 2021 09:29

original-brownbear added backport pending and removed backport pending labels Jun 29, 2021

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 29, 2021

Introduce ChunkedBlobOutputStream (elastic#74620)

5aa0968

Extracted the chunked output stream logic from elastic#74313 and added tests for it to make it easier to review.

original-brownbear mentioned this pull request Jun 29, 2021

Save Memory on Large Repository Metadata Blob Writes #74693

Merged

tlrx mentioned this pull request Jun 30, 2021

[7.x] Skip Google Cloud Storage tests on JDK #74763

Merged

benwtrent mentioned this pull request Jun 30, 2021

[7.14] Skip Google Cloud Storage tests on JDK (#74763) #74793

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

original-brownbear restored the save-memory-large-meta-blob-writes branch April 18, 2023 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save Memory on Large Repository Metadata Blob Writes #74313

Save Memory on Large Repository Metadata Blob Writes #74313

original-brownbear commented Jun 18, 2021 •

edited

Loading

original-brownbear Jun 20, 2021

fcofdez Jun 21, 2021

original-brownbear Jun 20, 2021

tlrx Jun 29, 2021

fcofdez left a comment

tlrx left a comment

tlrx Jun 29, 2021

original-brownbear commented Jun 29, 2021

eedugon commented Nov 15, 2021

original-brownbear commented Nov 15, 2021

Save Memory on Large Repository Metadata Blob Writes #74313

Save Memory on Large Repository Metadata Blob Writes #74313

Conversation

original-brownbear commented Jun 18, 2021 • edited Loading

original-brownbear Jun 20, 2021

Choose a reason for hiding this comment

fcofdez Jun 21, 2021

Choose a reason for hiding this comment

original-brownbear Jun 20, 2021

Choose a reason for hiding this comment

tlrx Jun 29, 2021

Choose a reason for hiding this comment

fcofdez left a comment

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

tlrx Jun 29, 2021

Choose a reason for hiding this comment

original-brownbear commented Jun 29, 2021

eedugon commented Nov 15, 2021

original-brownbear commented Nov 15, 2021

original-brownbear commented Jun 18, 2021 •

edited

Loading