Optimize sequential reads in SearchableSnapshotIndexInput #51230

DaveCTurner · 2020-01-20T16:37:26Z

Today SearchableSnapshotIndexInput translates each readBytesInternal call
to one or more calls to readBlob on the underlying repository. We make a lot
of small readBytesInternal calls since they are used to fill a small
in-memory buffer. Calls to readBlob are expensive: blob storage providers
like AWS S3 charge money per API call.

A common usage pattern is to take a brand-new IndexInput, seek to a
particular location, and then sequentially read a substantial amount of data
and stream it to disk.

This commit optimizes the implementation for that specific usage pattern.
Rather than calling readBlob each time the internal buffer needs filling we
instead request a (potentially much larger) range of the blob and consume the
response bit-by-bit as needed by a sequentially-reading client.

Today `SearchableSnapshotIndexInput` translates each `readBytesInternal` call to one or more calls to `readBlob` on the underlying repository. We make a lot of small `readBytesInternal` calls since they are used to fill a small in-memory buffer. Calls to `readBlob` are expensive: blob storage providers like AWS S3 charge money per API call. A common usage pattern is to take a brand-new `IndexInput`, seek to a particular location, and then sequentially read a substantial amount of data and stream it to disk. This commit optimizes the implementation for that specific usage pattern. Rather than calling `readBlob` each time the internal buffer needs filling we instead request a (potentially much larger) range of the blob and consume the response bit-by-bit as needed by a sequentially-reading client.

elasticmachine · 2020-01-20T16:37:29Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

…e-snapshot-readahead

ywelsch

I've left some comments, but I'm not sure I understand the concurrency model.

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

ywelsch · 2020-01-23T10:21:02Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

+                assert streamForSequentialReads.isFullyRead() == false;
+                int read = streamForSequentialReads.inputStream.read(b, offset, length);
+                assert read <= length : read + " vs " + length;
+                streamForSequentialReads.pos += read;


should we put this logic into StreamForSequentialReads? Perhaps that class could enforce that only sequential reads are possible from the stream (and offer a method to say isSequentialReadPossible), with the logic in this class here just trying to call these methods.

Moved this around in 1d69293 and 184c9b1.

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

ywelsch · 2020-01-23T10:27:55Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

+            }
+        }
+
+        // read part of a blob directly; the code above falls through to this case where there is no optimization possible


maybe put everything above this into a readOptimized() method that returns a boolean (denoting whether it read or not). This will allow having so many explicit returns in the above code (and the deliberate fall-through logic).

Sounds good.

See 1d69293 and 184c9b1.

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

ywelsch · 2020-01-23T10:35:27Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

+            return true;
+        } else {
+            // streamLength <= length so this single read will consume the entire stream, so there is no need to keep hold of it, so we can
+            // tell the caller to read the data directly


Should we not use this existing open stream as much as possible? We might not be able to read the full bytes from this stream, but perhaps we can use it to read everything up to streamLength, and subsequently request a new stream for the rest? This might avoid redownloading data in case where the buffer size is not a proper divisor of sequentialReadSize?

At this point we don't have an existing open stream, we're trying to create a new one. If we can satisfy part of a read from the existing stream then we do so (see comment containing the string the current stream didn't contain enough data for this read, so we must read more).

…sing

…e-snapshot-readahead

DaveCTurner · 2020-01-23T13:40:05Z

Failure of elasticsearch-ci/2 looks like #51347; checkstyle fix is incoming.

…e-snapshot-readahead

tlrx

LGTM - I left only minor comments. Thanks for the additional tests and the many comments that helps to review this 👍

tlrx · 2020-02-04T14:19:06Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

-        this("SearchableSnapshotIndexInput(" + fileInfo.physicalName() + ")", blobContainer, fileInfo, 0L, 0L, fileInfo.length());
+    // optimisation for the case where we perform a single seek, then read a large block of data sequentially, then close the input
+    @Nullable // if not currently reading sequentially
+    private StreamForSequentialReads streamForSequentialReads;


Maybe update the class javadoc to explain how/why we use this?

sure, done in f18251a

tlrx · 2020-02-04T14:21:05Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

+            assert streamForSequentialReads.isFullyRead() == false;
+            sequentialReadSize = NO_SEQUENTIAL_READ_OPTIMIZATION;
+            IOUtils.close(streamForSequentialReads);
+            streamForSequentialReads = null;


We're closing + nullify the streamForSequentialReads many times, maybe it deserves its own closeSequentialStream() method?

Good point, this is no longer a one-liner. Done in c9cf7bc.

tlrx · 2020-02-04T14:23:09Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

+     * @return the number of bytes read; if a new stream wasn't opened then nothing was read so the caller should perform the read directly.
+     */
+    private int readFromNewSequentialStream(int part, long pos, byte[] b, int offset, int length)
+        throws IOException {


The method signature can fit on a single line

It didn't always ;) Fixed in 51d2af5

tlrx · 2020-02-04T14:24:56Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

+        if (position != offset + pos) {
+            position = offset + pos;
+            IOUtils.close(streamForSequentialReads);
+            streamForSequentialReads = null;


Maybe nullify in a finally block, just in case

Oh good point, done in c9cf7bc.

tlrx · 2020-02-04T14:25:31Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

    }

    @Override
    public BufferedIndexInput clone() {
-        return new SearchableSnapshotIndexInput("clone(" + this + ")", blobContainer, fileInfo, position, offset, length);
+        return new SearchableSnapshotIndexInput("clone(" + this + ")", blobContainer, fileInfo, position, offset, length,
+            NO_SEQUENTIAL_READ_OPTIMIZATION, getBufferSize());


Maybe add a small word on why we can't read optimized for clones?

Comments added in 2fee6b8.

DaveCTurner · 2020-02-04T17:09:02Z

Thanks @tlrx

DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jan 20, 2020

DaveCTurner requested review from tlrx and ywelsch January 20, 2020 16:37

DaveCTurner added 4 commits January 20, 2020 16:45

Post-PR-opening blues

b8a425b

Introduce constant to clarify that clones and slices are not optimized

6b22442

Merge branch 'feature/searchable-snapshots' into 2020-01-20-searchabl…

855b661

…e-snapshot-readahead

Fix merge conflict

4e0cfcf

ywelsch reviewed Jan 23, 2020

View reviewed changes

DaveCTurner added 7 commits January 23, 2020 11:39

Sequential read size depends on container

9359e70

Fix partial assertion message

78087ab

(!= false) == (== true)

33a1362

Adjust behaviour now that the only concurrent thing we support is clo…

84d12b6

…sing

readOptimized()

1d69293

More refactoring into StreamForSequentialReads etc.

184c9b1

Merge branch 'feature/searchable-snapshots' into 2020-01-20-searchabl…

70b44c3

…e-snapshot-readahead

DaveCTurner requested a review from ywelsch January 23, 2020 12:56

DaveCTurner added 4 commits January 23, 2020 13:57

Imports

f4094c7

Merge branch 'feature/searchable-snapshots' into 2020-01-20-searchabl…

4a88bb4

…e-snapshot-readahead

No need to handle concurrent closing

eeab774

Merge branch 'feature/searchable-snapshots' into 2020-01-20-searchabl…

a92ced5

…e-snapshot-readahead

tlrx approved these changes Feb 4, 2020

View reviewed changes

DaveCTurner added 5 commits February 4, 2020 15:41

Extract closeStreamForSequentialReads and use finally

c9cf7bc

Whitespace

51d2af5

Comment on why slices/clones are not optimized

2fee6b8

Add Javadoc

f18251a

typo

f9d4def

DaveCTurner merged commit 30b5553 into elastic:feature/searchable-snapshots Feb 4, 2020

DaveCTurner deleted the 2020-01-20-searchable-snapshot-readahead branch February 4, 2020 17:09

tlrx mentioned this pull request Apr 6, 2020

Merge feature/searchable-snapshots branch into master #54803

Merged

Optimize sequential reads in SearchableSnapshotIndexInput #51230

Optimize sequential reads in SearchableSnapshotIndexInput #51230

Uh oh!

Conversation

DaveCTurner commented Jan 20, 2020

Uh oh!

elasticmachine commented Jan 20, 2020

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Jan 23, 2020

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Feb 4, 2020

Uh oh!

Uh oh!