Skip to content

Deserialize BlobStore Metadata Files in a Streaming Manner #73149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

original-brownbear
Copy link
Member

We were reading the full file contents up-front here because of the complexity
of verifying the footer otherwise. This commit moves the logic for reading metadata
blobs (that can become quite sizable in some cases + there's plans for larger aggregate meta blobs as well) in a streaming manner by manually doing the footer verification as Lucene's utility methods don't allow for
verification on top of a stream.

A possible follow-up to this would be to fix the write side the same way and get rid of the need to fully-buffer blobs before writing there as well.

We were reading the full file contents up-front here because of the complexity
of verifying the footer otherwise. This commit moves the logic for reading metadata
blobs (that can become quite sizable in some cases) in a streaming manner by
manually doing the footer verification as Lucene's utility methods don't allow for
verification on top of a stream.
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 17, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

fail("Should have failed due to corruption");
} catch (ElasticsearchCorruptionException ex) {
assertThat(ex.getMessage(), containsString("test-path"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to include the path here and complicate the code IMO, we wrap and/or log the path of what failed to deserialize upstream anyway to get insight into where other IOExceptions happened.

@original-brownbear
Copy link
Member Author

Jenkins run elasticsearch-ci/part-2

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems nicer indeed. I left some ideas mainly on comments, assertions and tests.

@@ -98,10 +97,10 @@ public void testBlobStoreOperations() throws IOException {
MockBigArrays.NON_RECYCLING_INSTANCE);

// Assert that all checksum blobs can be read
assertEquals(checksumSMILE.read(blobContainer, "check-smile", xContentRegistry(), MockBigArrays.NON_RECYCLING_INSTANCE).getText(),
assertEquals(checksumSMILE.read(blobContainer, "check-smile", xContentRegistry()).getText(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests only cover very small blobs so we're not really exercising the corners of the refilling logic. Let's have some blobs that are up to a few times larger than the buffer size too.

We also apparently only use a few different read sizes and only call the one-byte read() for reading the header.

IMO DeserializeMetaBlobInputStream could reasonably be a top-level class with some more focussed tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests only cover very small blobs so we're not really exercising the corners of the refilling logic. Let's have some blobs that are up to a few times larger than the buffer size too.

++ I adjusted the test to use some larger blobs now.

We also apparently only use a few different read sizes and only call the one-byte read() for reading the header.

Right, it's just single reads during header read, 4k for the decompressing stream buffer and exactly 8000 from the Jackson SMILE parser at the moment as far as I can tell. Testing with random sizes of up to 3x8k should run into all mathematically possible corner cases now I think. At least I was able to run 100k+ iterations without failure with the new tests.

Copy link
Contributor

@DaveCTurner DaveCTurner May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned about coverage, for instance we apparently never hit the interesting cases in read():

diff --git a/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java b/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
index 7d43b3d715e..02b8aba2fd0 100644
--- a/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
+++ b/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
@@ -154,9 +154,11 @@ public final class ChecksumBlobStoreFormat<T extends ToXContent> {
         @Override
         public int read() throws IOException {
             if (buffered() <= 0) {
+                assert bufferCount == 0;
                 fill();
             }
             if (buffered() <= 0) {
+                assert false;
                 return -1;
             }
             return buffer[bufferPos++];

Fine for now but the scope for future bugs worries me.

I think I'd be semi-happy if we consolidated the logic that tries to make sure some bytes are available, something like this (with a suitable renaming too):

diff --git a/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java b/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
index 7d43b3d715e..8b3738ab141 100644
--- a/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
+++ b/server/src/main/java/org/elasticsearch/repositories/blobstore/ChecksumBlobStoreFormat.java
@@ -153,10 +153,7 @@ public final class ChecksumBlobStoreFormat<T extends ToXContent> {

         @Override
         public int read() throws IOException {
-            if (buffered() <= 0) {
-                fill();
-            }
-            if (buffered() <= 0) {
+            if (getAvailable() <= 0) {
                 return -1;
             }
             return buffer[bufferPos++];
@@ -187,10 +184,7 @@ public final class ChecksumBlobStoreFormat<T extends ToXContent> {
         }

         private int doRead(byte[] b, int off, int len) throws IOException {
-            if (buffered() <= 0) {
-                fill();
-            }
-            final int available = buffered();
+            final int available = getAvailable();
             if (available < 0) {
                 return -1;
             }
@@ -237,24 +231,26 @@ public final class ChecksumBlobStoreFormat<T extends ToXContent> {
             return CompressorFactory.COMPRESSOR.isCompressed(new BytesArray(buffer, bufferPos, bufferCount - bufferPos));
         }

-        private int buffered() {
-            // bytes in the buffer minus 16 bytes that could be the footer
-            return bufferCount - bufferPos - CodecUtil.footerLength();
-        }
-
-        private void fill() throws IOException {
+        /**
+         * @return the number of bytes available in the buffer, possibly refilling the buffer if needed
+         */
+        private int getAvailable() throws IOException {
+            final int footerLen = CodecUtil.footerLength();
             if (bufferCount == 0) {
+                // first read, fill the buffer
+                assert bufferPos == 0;
                 bufferCount = Streams.readFully(in, buffer, 0, buffer.length);
-            } else {
+            } else if (bufferPos == bufferCount - footerLen) {
                 // crc and discard all but the last 16 bytes in the buffer that might be the footer bytes
-                final int footerLen = CodecUtil.footerLength();
                 assert bufferCount >= footerLen;
-                assert bufferPos == bufferCount - footerLen;
                 crc32.update(buffer, 0, bufferPos);
                 System.arraycopy(buffer, bufferPos, buffer, 0, footerLen);
                 bufferCount = footerLen + Streams.readFully(in, buffer, footerLen, buffer.length - footerLen);
                 bufferPos = 0;
             }
+
+            // bytes in the buffer minus 16 bytes that could be the footer
+            return bufferCount - bufferPos - footerLen;
         }
     }

Copy link
Member Author

@original-brownbear original-brownbear May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ applied that change, hope semi is ok in the short-run :)

@original-brownbear
Copy link
Member Author

Thanks @DaveCTurner :) All suggestions applied with the exception of extracting the stream to a top-level class. It seemed for now just testing a wider range of blob sizes gives us the same coverage with less noise.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

boolean nextBytesCompressed() {
// we already have bytes buffered here because we verify the blob's header (far less than the 8k buffer size) before calling
// this method
return CompressorFactory.COMPRESSOR.isCompressed(new BytesArray(buffer, bufferPos, bufferCount - bufferPos));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe assert that bufferPos > 0?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a problem in BlobStoreFormatTests#testBlobCorruption, since we now start reading the blob before checking it's valid so we get all sorts of other exceptions too.

@original-brownbear
Copy link
Member Author

I think we have a problem in BlobStoreFormatTests#testBlobCorruption, since we now start reading the blob before checking it's valid so we get all sorts of other exceptions too.

Urgh, nice find :) I partly fixed this by expanding the try-catch scope to include the header check and partly by just expecting more exceptions in the test. I didn't want to blanket catch-rethrow the ones I only added in the test because there's not "proof" of corruption with those and they may be the result of random bugs in our parsing (e.g. when we moved to not allow duplicate fields any more and would start throwing all kinds ).

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM thanks for the extra iterations

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually no I've had a change of heart. I think throwing these random exceptions on corruption will cause pain since they're indistinguishable from an actual corruption. Can we make it so that we read to the end in case of exception and report if the checksum was broken in any case?

@original-brownbear
Copy link
Member Author

@DaveCTurner fair point :)

I pushed 9c7fb6e to always verify footer

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes that's better IMO. LGTM now.

@original-brownbear
Copy link
Member Author

Thanks David & Tanguy!

@original-brownbear original-brownbear merged commit 6dd2a2a into elastic:master May 18, 2021
@original-brownbear original-brownbear deleted the efficient-reading-blob-metadata branch May 18, 2021 12:33
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 13, 2021
…3149)

We were reading the full file contents up-front here because of the complexity
of verifying the footer otherwise. This commit moves the logic for reading metadata
blobs (that can become quite sizable in some cases) in a streaming manner by
manually doing the footer verification as Lucene's utility methods don't allow for
verification on top of a stream.
original-brownbear added a commit that referenced this pull request Jun 14, 2021
…74050)

We were reading the full file contents up-front here because of the complexity
of verifying the footer otherwise. This commit moves the logic for reading metadata
blobs (that can become quite sizable in some cases) in a streaming manner by
manually doing the footer verification as Lucene's utility methods don't allow for
verification on top of a stream.
@original-brownbear original-brownbear restored the efficient-reading-blob-metadata branch April 18, 2023 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants