Skip to content

Add Lucene directory and index input implementations that expose shard snapshot #49651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Nov 27, 2019

Note: this pull request targets the feature/searchable-snapshots branch

This pull requests adds a SearchableSnapshotDirectory implementation that exposes the snapshot of a shard as a Lucene Directory.

This directory only supports read operations and does not allow any file modification. It allows:

  • to list all the files stored in the shard snapshot using Directory#listAll()
  • to return the byte length of a file in the snapshot using Directory#fileLength(String)
  • to open an IndexInput for reading an existing file of the snapshot using Directory#openInput(String, IOContext)

In order to work, the directory requires the list of the shard snapshot files and a way to read a specific range of bytes blob. The list of shard snapshot files must be provided as a BlobStoreIndexShardSnapshot object when the directory is created. This object contains the list of the shard files stored in the snapshot and can be used to map each Lucene file with its corresponding blob(s) stored in the repository (which can be more than one as large Lucene files are split during snapshot).

Blobs are directly read from the snapshot using a BlobContainer.

SearchableSnapshotDirectory provides SearchableSnapshotIndexInput to read a file from the snapshot. This index input implementation maintains an internal buffer (it extends BufferedIndexInput) and takes care of tracking current reading position in the file. Each time more bytes are requested to fill the internal buffer, SearchableSnapshotIndexInput maps the current position to the appropriate blob name and position in the blob to read bytes from. It also propagates the knowledge of the current position to any clone or slice.

This pull request also adds tests for the SearchableSnapshotDirectory which creates a random directory, index documents into it, snapshots the files and creates a SearchableSnapshotDirectory from this snapshot. It then runs some tests against the normal directory and the searchable snapshot directory and compares the results.

@tlrx tlrx added WIP :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. labels Nov 27, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

@tlrx
Copy link
Member Author

tlrx commented Nov 27, 2019

@elasticmachine test this please

@tlrx tlrx force-pushed the searchable-snapshots-directory branch from 7a99aaa to e2438cb Compare December 11, 2019 12:59
@tlrx
Copy link
Member Author

tlrx commented Dec 11, 2019

@DaveCTurner I adjusted this PR a bit so that the plugin provides a DirectoryFactory and the IndexInput now directly reads range of bytes using a BlobContainer. There are some // NORELEASE tags here and there but the whole logic works as expected and should unlock your work.

@tlrx tlrx requested a review from DaveCTurner December 11, 2019 13:03
@tlrx tlrx removed the WIP label Dec 11, 2019
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. I left a few smaller comments but the overall structure is great.

@tlrx
Copy link
Member Author

tlrx commented Dec 12, 2019

@DaveCTurner Thanks for your feedback. I think I addressed all your comments. I don't expect the bwc tests to pass until the master is merged again in the feature branch (JDK13).

Can you please have another look?

@tlrx tlrx requested a review from DaveCTurner December 12, 2019 16:13
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround. Marked most comments as resolved, added one new comment, and expanded one existing comment.

break;
default:
fail();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing coverage of reads that aren't purely forwards and contiguous, i.e. an interleaving of seeks and reads (particularly given the three different read methods). For instance, reading from more than one place in the file, possibly jumping back-and-forth across blobs, seems like an interesting workload for a cache. To be clear I don't think there's a bug here, just that if I was going to add a bug here in future then that's where I think I'd put it 😁

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some drive-by-comments :)

@@ -163,6 +163,14 @@ public InputStream readBlob(String name) throws IOException {
}
}

@Override
public InputStream readBlob(String blobName, long position, int length) throws IOException {
final InputStream inputStream = readBlob(blobName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just use this as the default implementation in BlobContainer instead of throwing. I think all our streams support skip? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hacky and only less harmful for FS repository so I preferred to implement it in FS blob container and let the other implementation as not supported until the method is correctly implemented for each of them (using range of bytes download)

* @throws NoSuchFileException if the blob does not exist
* @throws IOException if the blob can not be read.
*/
default InputStream readBlob(final String blobName, final long position, final int length) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want length on this API? Wouldn't it be better to just have IndexInput keep a reference to an open stream and only open a new stream if we seek backwards instead of opening a new stream of bounded length repeatedly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be done like you suggest for FS repository (and maybe HDFS too) but for other repositories we need to give an indication of the number of bytes we want to download, because unlike the RestoreService we don't want to read all the blobs but only a chunk of it. Most SDK require to consume all the requested bytes (or will consume them under the hood for you) and we don't want to open a stream that reads a complete blob if we only use the first 28 bytes to read a header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most SDK require to consume all the requested bytes (or will consume them under the hood for you) and we don't want to open a stream that reads a complete blob if we only use the first 28 bytes to read a header.

Fair point :) You have abort as a method on the S3 input stream though, not sure about GCS here.

@Override
protected void readInternal(byte[] b, int offset, int length) throws IOException {
ensureOpen();
if (fileInfo.numberOfParts() == 1L) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not reuse the logic from the restore codebase that has a sliced input stream already here instead of building the same thing again? (also see my comment on keeping a reference to a stream open until we're seeking backwards).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion, we should be able to use SlicedInputStream combined with the length parameter. I'll take a look :)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

break;
default:
fail();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a2de0d2 looks good, thanks, although we'd better not be using the XKCD random number generator...

@tlrx
Copy link
Member Author

tlrx commented Dec 13, 2019

@elasticmachine update branch

@tlrx tlrx merged commit 0940bcd into elastic:feature/searchable-snapshots Dec 13, 2019
@tlrx tlrx deleted the searchable-snapshots-directory branch December 13, 2019 11:23
@tlrx
Copy link
Member Author

tlrx commented Dec 13, 2019

Thanks @DaveCTurner and @original-brownbear !

@DaveCTurner DaveCTurner mentioned this pull request Jan 14, 2020
19 tasks
@SherazT
Copy link

SherazT commented Feb 19, 2020

@tlrx Hey! I'm actually trying to parse snapshots .dat files and print the individual documents from them. I posted a question here (https://discuss.elastic.co/t/how-to-parse-snapshot-dat-file/218888) and I'm wondering if this feature does what I think it's doing? I have snap-{uuid}.dat files and have the further snap files relating to the shard, so will this feature help me print out the entries contained in the shard? Are these files that you refer you documents within each shard building up the snapshot? Would love your help on this.

@tlrx
Copy link
Member Author

tlrx commented Feb 19, 2020

@SherazT We replied on Discuss, it's better to keep the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants