Uncouple CacheDirectory from SearchableSnapshotDirectory #53860

tlrx · 2020-03-20T12:04:23Z

Today CacheDirectory is implemented as a FilterDirectory that caches files locally while delegating the read operations to SearchableSnapshotDirectory.

This was very useful to separate concerns like caching Lucene files on disk from reading Lucene files from a blob store repository, but it comes with additional complexity:

IndexInput are buffered within each directory, making it difficult to understand the reading pattern
in case of cache evictions, the CacheDirectory attempts to directly read N bytes from the IndexInput but because of the SearchableSnapshotDirectory read ahead it might in fact download more bytes than needed
the direct read bytes stat does not exactly reflect the bytes read by the underlying BlobContainer
changes like Add file type-based exclusion setting for searchable snapshots cache #53492 are a bit more complex than necessary

This pull request is a first step forward merging CacheDirectory into SearchableSnapshotDirectory. It changes the cache directory so that it does not rely on the searchable snapshot directory anymore and instead read the bytes directly from the BlobContainer. It also adds two more base classes that group common class attributes for directories and index inputs.

tlrx · 2020-03-20T12:06:07Z

...e-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotDirectory.java

+import java.util.Set;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+public abstract class BaseSearchableSnapshotDirectory extends BaseDirectory {


This abstract class has been introduced in order to contain common attributes of existing directories and in order to unify the existing constructors. It should become a concrete SearchableSnapshotDirectory when the cache logic and the searchable snapshot logic will be merged together.

tlrx · 2020-03-20T12:07:44Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

        if (SNAPSHOT_CACHE_ENABLED_SETTING.get(indexSettings.getSettings())) {
            final Path cacheDir = shardPath.getDataPath().resolve("snapshots").resolve(snapshotId.getUUID());
-            directory = new CacheDirectory(directory, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(),
+            directory = new CacheDirectory(snapshot, blobContainer, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(),


With this pull request, CacheDirectory reads blobs using the BlobContainer and does not need to wrap the SearchableSnapshotDirectory anymore.

tlrx · 2020-03-20T12:08:08Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

@@ -61,17 +60,21 @@
    private static final long NO_SEQUENTIAL_READ_OPTIMIZATION = 0L;


-    SearchableSnapshotIndexInput(final BlobContainer blobContainer, final FileInfo fileInfo, long sequentialReadSize, int bufferSize) {
-        this("SearchableSnapshotIndexInput(" + fileInfo.physicalName() + ")", blobContainer, fileInfo, 0L, 0L, fileInfo.length(),
+    SearchableSnapshotIndexInput(


I'll revert this spotless formatting

I for one welcome our new spotless overlords 🐜

tlrx · 2020-03-20T12:08:37Z

...napshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheDirectory.java

@@ -39,7 +43,7 @@
 /**
 * {@link CacheDirectory} uses a {@link CacheService} to cache Lucene files provided by another {@link Directory}.
 */
-public class CacheDirectory extends FilterDirectory {
+public class CacheDirectory extends BaseSearchableSnapshotDirectory {


This is the main purpose of this pull request :)

tlrx · 2020-03-20T12:11:25Z

...napshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheDirectory.java

            logger.trace(() -> new ParameterizedMessage("writing range [{}-{}] to cache file [{}]", start, end, cacheFileReference));

            int bytesCopied = 0;
-            try (IndexInput input = in.openInput(cacheFileReference.getFileName(), ioContext)) {
+            final long startTimeNanos = currentTimeNanosSupplier.getAsLong();
+            try (InputStream input = openInputStream(start, length)) {


There is an important change here: by reading the range to cache on disk using the BlobContainer only the required bytes are downloaded in a (usually) single request (unless the range spans over multiple blob parts) and there is no read ahead engaged here.

elasticmachine · 2020-03-20T12:15:56Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

tlrx · 2020-03-20T12:56:34Z

@elasticmachine update branch

…blob-container

DaveCTurner

Looks good @tlrx, but I think we can go further. I think we don't need to run things with the cache completely disabled any more, so we can drop the index.store.snapshot.cache.enabled setting, which means SearchableSnapshotDirectory is no longer needed, which means each base class has just one concrete implementation and so they can be merged together.

I'm ok with reviewing all that in one PR, I think, especially if the steps are mostly in separate commits. WDYT?

...e-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotDirectory.java

...-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotIndexInput.java

...est/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheBufferedIndexInputTests.java

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

DaveCTurner · 2020-03-20T13:02:19Z

...able-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotIndexInput.java

@@ -61,17 +60,21 @@
    private static final long NO_SEQUENTIAL_READ_OPTIMIZATION = 0L;


-    SearchableSnapshotIndexInput(final BlobContainer blobContainer, final FileInfo fileInfo, long sequentialReadSize, int bufferSize) {
-        this("SearchableSnapshotIndexInput(" + fileInfo.physicalName() + ")", blobContainer, fileInfo, 0L, 0L, fileInfo.length(),
+    SearchableSnapshotIndexInput(


I for one welcome our new spotless overlords 🐜

DaveCTurner · 2020-03-20T13:03:37Z

...napshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheDirectory.java

            logger.trace(() -> new ParameterizedMessage("writing range [{}-{}] to cache file [{}]", start, end, cacheFileReference));

            int bytesCopied = 0;
-            try (IndexInput input = in.openInput(cacheFileReference.getFileName(), ioContext)) {
+            final long startTimeNanos = currentTimeNanosSupplier.getAsLong();
+            try (InputStream input = openInputStream(start, length)) {


DaveCTurner

LGTM

...-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotIndexInput.java

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

DaveCTurner · 2020-03-20T13:44:08Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

        if (SNAPSHOT_CACHE_ENABLED_SETTING.get(indexSettings.getSettings())) {
            final Path cacheDir = shardPath.getDataPath().resolve("snapshots").resolve(snapshotId.getUUID());
-            directory = new CacheDirectory(directory, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(),
+            directory = new CacheDirectory(snapshot, blobContainer, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(),


premature LGTM, there's still a couple of requests open

DaveCTurner

sorry, LGTM'd it too soon, there's a couple of places where I think we can assert something's not called.

tlrx · 2020-03-20T13:54:00Z

Thanks @DaveCTurner - I've applied your feedback and also pushed a small change in BlobContainerWrapper that didn't make it in the previous commit.

DaveCTurner

LGTM

tlrx · 2020-03-20T14:58:28Z

Thanks David!

Following #53860, this commit extracts the CacheBufferedIndexInput class from the CacheDirectory so that it can be merged with SearchableSnapshotDirectory.

Following #53860 and #53879, this commit now merges CacheDirectory into SearchableSnapshotDirectory.

…54072) We can now detect blobs that are stored within the metadata hash field and open them directly as ByteArrayIndexInput. Relates #53860

CacheDirectory

61c8907

tlrx requested a review from DaveCTurner March 20, 2020 12:04

tlrx commented Mar 20, 2020

View reviewed changes

tlrx added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement labels Mar 20, 2020

Merge branch 'feature/searchable-snapshots' into cache-directory-use-…

6fa4928

…blob-container

DaveCTurner reviewed Mar 20, 2020

View reviewed changes

DaveCTurner previously approved these changes Mar 20, 2020

View reviewed changes

DaveCTurner reviewed Mar 20, 2020

View reviewed changes

apply feedback

e62549a

tlrx requested a review from DaveCTurner March 20, 2020 13:54

DaveCTurner approved these changes Mar 20, 2020

View reviewed changes

tlrx merged commit 8c732d0 into elastic:feature/searchable-snapshots Mar 20, 2020

tlrx deleted the cache-directory-use-blob-container branch March 20, 2020 14:58

tlrx mentioned this pull request Mar 20, 2020

Extract CacheBufferedIndexInput from CacheDirectory #53879

Merged

tlrx added a commit that referenced this pull request Mar 20, 2020

Extract CacheBufferedIndexInput from CacheDirectory (#53879)

815c861

Following #53860, this commit extracts the CacheBufferedIndexInput class from the CacheDirectory so that it can be merged with SearchableSnapshotDirectory.

tlrx mentioned this pull request Mar 21, 2020

Merge CacheDirectory into SearchableSnapshotDirectory #53917

Merged

tlrx added a commit that referenced this pull request Mar 24, 2020

Fold CacheDirectory within SearchableSnapshotDirectory (#53917)

02aa32e

Following #53860 and #53879, this commit now merges CacheDirectory into SearchableSnapshotDirectory.

tlrx mentioned this pull request Mar 24, 2020

Extract files stored within metadata hash into ByteArrayIndexInputs #54072

Merged

tlrx added a commit that referenced this pull request Mar 27, 2020

Extract files stored within metadata hash into ByteArrayIndexInputs (#…

f82f104

…54072) We can now detect blobs that are stored within the metadata hash field and open them directly as ByteArrayIndexInput. Relates #53860

tlrx mentioned this pull request Mar 30, 2020

Add optimized / direct read stats for non-cached files #54439

Merged

tlrx mentioned this pull request Apr 6, 2020

Merge feature/searchable-snapshots branch into master #54803

Merged

Uncouple CacheDirectory from SearchableSnapshotDirectory #53860

Uncouple CacheDirectory from SearchableSnapshotDirectory #53860

Uh oh!

Conversation

tlrx commented Mar 20, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Mar 20, 2020

Uh oh!

tlrx commented Mar 20, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

tlrx commented Mar 20, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

tlrx commented Mar 20, 2020

Uh oh!

Uh oh!