Add Lucene directory and index input implementations that expose shard snapshot #49651

tlrx · 2019-11-27T15:23:41Z

Note: this pull request targets the feature/searchable-snapshots branch

This pull requests adds a SearchableSnapshotDirectory implementation that exposes the snapshot of a shard as a Lucene Directory.

This directory only supports read operations and does not allow any file modification. It allows:

to list all the files stored in the shard snapshot using Directory#listAll()
to return the byte length of a file in the snapshot using Directory#fileLength(String)
to open an IndexInput for reading an existing file of the snapshot using Directory#openInput(String, IOContext)

In order to work, the directory requires the list of the shard snapshot files and a way to read a specific range of bytes blob. The list of shard snapshot files must be provided as a BlobStoreIndexShardSnapshot object when the directory is created. This object contains the list of the shard files stored in the snapshot and can be used to map each Lucene file with its corresponding blob(s) stored in the repository (which can be more than one as large Lucene files are split during snapshot).

Blobs are directly read from the snapshot using a BlobContainer.

SearchableSnapshotDirectory provides SearchableSnapshotIndexInput to read a file from the snapshot. This index input implementation maintains an internal buffer (it extends BufferedIndexInput) and takes care of tracking current reading position in the file. Each time more bytes are requested to fill the internal buffer, SearchableSnapshotIndexInput maps the current position to the appropriate blob name and position in the blob to read bytes from. It also propagates the knowledge of the current position to any clone or slice.

This pull request also adds tests for the SearchableSnapshotDirectory which creates a random directory, index documents into it, snapshots the files and creates a SearchableSnapshotDirectory from this snapshot. It then runs some tests against the normal directory and the searchable snapshot directory and compares the results.

elasticmachine · 2019-11-27T15:23:44Z

Pinging @elastic/es-distributed (:Distributed/Distributed)

tlrx · 2019-11-27T19:48:57Z

@elasticmachine test this please

tlrx · 2019-12-11T13:02:51Z

@DaveCTurner I adjusted this PR a bit so that the plugin provides a DirectoryFactory and the IndexInput now directly reads range of bytes using a BlobContainer. There are some // NORELEASE tags here and there but the whole logic works as expected and should unlock your work.

…-directory

DaveCTurner

Good stuff. I left a few smaller comments but the overall structure is great.

server/src/main/java/org/elasticsearch/common/blobstore/BlobContainer.java

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

...-snapshots/src/test/java/org/elasticsearch/index/store/SearchableSnapshotDirectoryTests.java

Co-Authored-By: David Turner <[email protected]>

tlrx · 2019-12-12T16:13:38Z

@DaveCTurner Thanks for your feedback. I think I addressed all your comments. I don't expect the bwc tests to pass until the master is merged again in the feature branch (JDK13).

Can you please have another look?

DaveCTurner

Thanks for the quick turnaround. Marked most comments as resolved, added one new comment, and expanded one existing comment.

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

DaveCTurner · 2019-12-12T16:46:02Z

...snapshots/src/test/java/org/elasticsearch/index/store/SearchableSnapshotIndexInputTests.java

+                    break;
+                default:
+                    fail();
+            }


I think we are missing coverage of reads that aren't purely forwards and contiguous, i.e. an interleaving of seeks and reads (particularly given the three different read methods). For instance, reading from more than one place in the file, possibly jumping back-and-forth across blobs, seems like an interesting workload for a cache. To be clear I don't think there's a bug here, just that if I was going to add a bug here in future then that's where I think I'd put it 😁

original-brownbear

Just some drive-by-comments :)

original-brownbear · 2019-12-12T18:33:27Z

server/src/main/java/org/elasticsearch/common/blobstore/fs/FsBlobContainer.java

@@ -163,6 +163,14 @@ public InputStream readBlob(String name) throws IOException {
        }
    }

+    @Override
+    public InputStream readBlob(String blobName, long position, int length) throws IOException {
+        final InputStream inputStream = readBlob(blobName);


I think you could just use this as the default implementation in BlobContainer instead of throwing. I think all our streams support skip? :)

This is hacky and only less harmful for FS repository so I preferred to implement it in FS blob container and let the other implementation as not supported until the method is correctly implemented for each of them (using range of bytes download)

original-brownbear · 2019-12-12T18:36:25Z

server/src/main/java/org/elasticsearch/common/blobstore/BlobContainer.java

+     * @throws NoSuchFileException if the blob does not exist
+     * @throws IOException         if the blob can not be read.
+     */
+    default InputStream readBlob(final String blobName, final long position, final int length) throws IOException {


Why would we want length on this API? Wouldn't it be better to just have IndexInput keep a reference to an open stream and only open a new stream if we seek backwards instead of opening a new stream of bounded length repeatedly?

It could be done like you suggest for FS repository (and maybe HDFS too) but for other repositories we need to give an indication of the number of bytes we want to download, because unlike the RestoreService we don't want to read all the blobs but only a chunk of it. Most SDK require to consume all the requested bytes (or will consume them under the hood for you) and we don't want to open a stream that reads a complete blob if we only use the first 28 bytes to read a header.

Most SDK require to consume all the requested bytes (or will consume them under the hood for you) and we don't want to open a stream that reads a complete blob if we only use the first 28 bytes to read a header.

Fair point :) You have abort as a method on the S3 input stream though, not sure about GCS here.

original-brownbear · 2019-12-12T18:39:53Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

+    @Override
+    protected void readInternal(byte[] b, int offset, int length) throws IOException {
+        ensureOpen();
+        if (fileInfo.numberOfParts() == 1L) {


Why not reuse the logic from the restore codebase that has a sliced input stream already here instead of building the same thing again? (also see my comment on keeping a reference to a stream open until we're seeking backwards).

That's a good suggestion, we should be able to use SlicedInputStream combined with the length parameter. I'll take a look :)

DaveCTurner

LGTM

DaveCTurner · 2019-12-13T09:42:19Z

...snapshots/src/test/java/org/elasticsearch/index/store/SearchableSnapshotIndexInputTests.java

+                    break;
+                default:
+                    fail();
+            }


a2de0d2 looks good, thanks, although we'd better not be using the XKCD random number generator...

tlrx · 2019-12-13T10:36:18Z

@elasticmachine update branch

…-directory

tlrx · 2019-12-13T11:24:06Z

Thanks @DaveCTurner and @original-brownbear !

SherazT · 2020-02-19T07:42:54Z

@tlrx Hey! I'm actually trying to parse snapshots .dat files and print the individual documents from them. I posted a question here (https://discuss.elastic.co/t/how-to-parse-snapshot-dat-file/218888) and I'm wondering if this feature does what I think it's doing? I have snap-{uuid}.dat files and have the further snap files relating to the shard, so will this feature help me print out the entries contained in the shard? Are these files that you refer you documents within each shard building up the snapshot? Would love your help on this.

tlrx · 2020-02-19T08:16:04Z

@SherazT We replied on Discuss, it's better to keep the discussion there.

tlrx added WIP :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. labels Nov 27, 2019

tlrx mentioned this pull request Dec 6, 2019

Introduce a caching mechanism for files in Searchable Snapshot Directory #49934

Closed

tlrx added 2 commits December 11, 2019 12:16

SearchableSnapshotDirectory

5b3960d

few adjustments

e2438cb

tlrx force-pushed the searchable-snapshots-directory branch from 7a99aaa to e2438cb Compare December 11, 2019 12:59

tlrx requested a review from DaveCTurner December 11, 2019 13:03

tlrx removed the WIP label Dec 11, 2019

Merge branch 'feature/searchable-snapshots' into searchable-snapshots…

d450eca

…-directory

DaveCTurner reviewed Dec 12, 2019

View reviewed changes

tlrx and others added 2 commits December 12, 2019 16:14

Apply suggestions from code review

dce4808

Co-Authored-By: David Turner <[email protected]>

Apply feedback

309a706

tlrx requested a review from DaveCTurner December 12, 2019 16:13

DaveCTurner reviewed Dec 12, 2019

View reviewed changes

original-brownbear reviewed Dec 12, 2019

View reviewed changes

tlrx added 2 commits December 13, 2019 09:35

Add missing setting

8ad7174

random seeking

a2de0d2

DaveCTurner approved these changes Dec 13, 2019

View reviewed changes

override output value so we can check the random read succeded

9cba9a9

Merge branch 'feature/searchable-snapshots' into searchable-snapshots…

5c2bd1a

…-directory

tlrx merged commit 0940bcd into elastic:feature/searchable-snapshots Dec 13, 2019

tlrx deleted the searchable-snapshots-directory branch December 13, 2019 11:23

DaveCTurner mentioned this pull request Jan 14, 2020

Lazy snapshot restores #50999

Closed

19 tasks

tlrx mentioned this pull request Apr 6, 2020

Merge feature/searchable-snapshots branch into master #54803

Merged

Add Lucene directory and index input implementations that expose shard snapshot #49651

Add Lucene directory and index input implementations that expose shard snapshot #49651

Uh oh!

Conversation

tlrx commented Nov 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Nov 27, 2019

Uh oh!

tlrx commented Nov 27, 2019

Uh oh!

tlrx commented Dec 11, 2019

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tlrx commented Dec 12, 2019

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented Dec 13, 2019

Uh oh!

tlrx commented Dec 13, 2019

Uh oh!

SherazT commented Feb 19, 2020

Uh oh!

tlrx commented Feb 19, 2020

Uh oh!

Uh oh!

tlrx commented Nov 27, 2019 •

edited

Loading