Skip to content

Optimize sequential reads in SearchableSnapshotIndexInput #51230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DaveCTurner
Copy link
Contributor

Today SearchableSnapshotIndexInput translates each readBytesInternal call
to one or more calls to readBlob on the underlying repository. We make a lot
of small readBytesInternal calls since they are used to fill a small
in-memory buffer. Calls to readBlob are expensive: blob storage providers
like AWS S3 charge money per API call.

A common usage pattern is to take a brand-new IndexInput, seek to a
particular location, and then sequentially read a substantial amount of data
and stream it to disk.

This commit optimizes the implementation for that specific usage pattern.
Rather than calling readBlob each time the internal buffer needs filling we
instead request a (potentially much larger) range of the blob and consume the
response bit-by-bit as needed by a sequentially-reading client.

Today `SearchableSnapshotIndexInput` translates each `readBytesInternal` call
to one or more calls to `readBlob` on the underlying repository. We make a lot
of small `readBytesInternal` calls since they are used to fill a small
in-memory buffer. Calls to `readBlob` are expensive: blob storage providers
like AWS S3 charge money per API call.

A common usage pattern is to take a brand-new `IndexInput`, seek to a
particular location, and then sequentially read a substantial amount of data
and stream it to disk.

This commit optimizes the implementation for that specific usage pattern.
Rather than calling `readBlob` each time the internal buffer needs filling we
instead request a (potentially much larger) range of the blob and consume the
response bit-by-bit as needed by a sequentially-reading client.
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jan 20, 2020
@DaveCTurner DaveCTurner requested review from tlrx and ywelsch January 20, 2020 16:37
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments, but I'm not sure I understand the concurrency model.

assert streamForSequentialReads.isFullyRead() == false;
int read = streamForSequentialReads.inputStream.read(b, offset, length);
assert read <= length : read + " vs " + length;
streamForSequentialReads.pos += read;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put this logic into StreamForSequentialReads? Perhaps that class could enforce that only sequential reads are possible from the stream (and offer a method to say isSequentialReadPossible), with the logic in this class here just trying to call these methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this around in 1d69293 and 184c9b1.

}
}

// read part of a blob directly; the code above falls through to this case where there is no optimization possible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put everything above this into a readOptimized() method that returns a boolean (denoting whether it read or not). This will allow having so many explicit returns in the above code (and the deliberate fall-through logic).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 1d69293 and 184c9b1.

return true;
} else {
// streamLength <= length so this single read will consume the entire stream, so there is no need to keep hold of it, so we can
// tell the caller to read the data directly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not use this existing open stream as much as possible? We might not be able to read the full bytes from this stream, but perhaps we can use it to read everything up to streamLength, and subsequently request a new stream for the rest? This might avoid redownloading data in case where the buffer size is not a proper divisor of sequentialReadSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we don't have an existing open stream, we're trying to create a new one. If we can satisfy part of a read from the existing stream then we do so (see comment containing the string the current stream didn't contain enough data for this read, so we must read more).

@DaveCTurner DaveCTurner requested a review from ywelsch January 23, 2020 12:56
@DaveCTurner
Copy link
Contributor Author

Failure of elasticsearch-ci/2 looks like #51347; checkstyle fix is incoming.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I left only minor comments. Thanks for the additional tests and the many comments that helps to review this 👍

this("SearchableSnapshotIndexInput(" + fileInfo.physicalName() + ")", blobContainer, fileInfo, 0L, 0L, fileInfo.length());
// optimisation for the case where we perform a single seek, then read a large block of data sequentially, then close the input
@Nullable // if not currently reading sequentially
private StreamForSequentialReads streamForSequentialReads;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update the class javadoc to explain how/why we use this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done in f18251a

assert streamForSequentialReads.isFullyRead() == false;
sequentialReadSize = NO_SEQUENTIAL_READ_OPTIMIZATION;
IOUtils.close(streamForSequentialReads);
streamForSequentialReads = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're closing + nullify the streamForSequentialReads many times, maybe it deserves its own closeSequentialStream() method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this is no longer a one-liner. Done in c9cf7bc.

* @return the number of bytes read; if a new stream wasn't opened then nothing was read so the caller should perform the read directly.
*/
private int readFromNewSequentialStream(int part, long pos, byte[] b, int offset, int length)
throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method signature can fit on a single line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't always ;) Fixed in 51d2af5

if (position != offset + pos) {
position = offset + pos;
IOUtils.close(streamForSequentialReads);
streamForSequentialReads = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe nullify in a finally block, just in case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point, done in c9cf7bc.

}

@Override
public BufferedIndexInput clone() {
return new SearchableSnapshotIndexInput("clone(" + this + ")", blobContainer, fileInfo, position, offset, length);
return new SearchableSnapshotIndexInput("clone(" + this + ")", blobContainer, fileInfo, position, offset, length,
NO_SEQUENTIAL_READ_OPTIMIZATION, getBufferSize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a small word on why we can't read optimized for clones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added in 2fee6b8.

@DaveCTurner DaveCTurner merged commit 30b5553 into elastic:feature/searchable-snapshots Feb 4, 2020
@DaveCTurner
Copy link
Contributor Author

Thanks @tlrx

@DaveCTurner DaveCTurner deleted the 2020-01-20-searchable-snapshot-readahead branch February 4, 2020 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants