-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Uncouple CacheDirectory from SearchableSnapshotDirectory #53860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncouple CacheDirectory from SearchableSnapshotDirectory #53860
Conversation
import java.util.Set; | ||
import java.util.concurrent.atomic.AtomicBoolean; | ||
|
||
public abstract class BaseSearchableSnapshotDirectory extends BaseDirectory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This abstract class has been introduced in order to contain common attributes of existing directories and in order to unify the existing constructors. It should become a concrete SearchableSnapshotDirectory
when the cache logic and the searchable snapshot logic will be merged together.
if (SNAPSHOT_CACHE_ENABLED_SETTING.get(indexSettings.getSettings())) { | ||
final Path cacheDir = shardPath.getDataPath().resolve("snapshots").resolve(snapshotId.getUUID()); | ||
directory = new CacheDirectory(directory, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(), | ||
directory = new CacheDirectory(snapshot, blobContainer, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this pull request, CacheDirectory reads blobs using the BlobContainer and does not need to wrap the SearchableSnapshotDirectory anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -61,17 +60,21 @@ | |||
private static final long NO_SEQUENTIAL_READ_OPTIMIZATION = 0L; | |||
|
|||
|
|||
SearchableSnapshotIndexInput(final BlobContainer blobContainer, final FileInfo fileInfo, long sequentialReadSize, int bufferSize) { | |||
this("SearchableSnapshotIndexInput(" + fileInfo.physicalName() + ")", blobContainer, fileInfo, 0L, 0L, fileInfo.length(), | |||
SearchableSnapshotIndexInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert this spotless formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I for one welcome our new spotless overlords 🐜
@@ -39,7 +43,7 @@ | |||
/** | |||
* {@link CacheDirectory} uses a {@link CacheService} to cache Lucene files provided by another {@link Directory}. | |||
*/ | |||
public class CacheDirectory extends FilterDirectory { | |||
public class CacheDirectory extends BaseSearchableSnapshotDirectory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main purpose of this pull request :)
logger.trace(() -> new ParameterizedMessage("writing range [{}-{}] to cache file [{}]", start, end, cacheFileReference)); | ||
|
||
int bytesCopied = 0; | ||
try (IndexInput input = in.openInput(cacheFileReference.getFileName(), ioContext)) { | ||
final long startTimeNanos = currentTimeNanosSupplier.getAsLong(); | ||
try (InputStream input = openInputStream(start, length)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an important change here: by reading the range to cache on disk using the BlobContainer
only the required bytes are downloaded in a (usually) single request (unless the range spans over multiple blob parts) and there is no read ahead engaged here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @tlrx, but I think we can go further. I think we don't need to run things with the cache completely disabled any more, so we can drop the index.store.snapshot.cache.enabled
setting, which means SearchableSnapshotDirectory
is no longer needed, which means each base class has just one concrete implementation and so they can be merged together.
I'm ok with reviewing all that in one PR, I think, especially if the steps are mostly in separate commits. WDYT?
...e-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotDirectory.java
Show resolved
Hide resolved
...-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotIndexInput.java
Show resolved
Hide resolved
...est/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheBufferedIndexInputTests.java
Outdated
Show resolved
Hide resolved
...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java
Show resolved
Hide resolved
@@ -61,17 +60,21 @@ | |||
private static final long NO_SEQUENTIAL_READ_OPTIMIZATION = 0L; | |||
|
|||
|
|||
SearchableSnapshotIndexInput(final BlobContainer blobContainer, final FileInfo fileInfo, long sequentialReadSize, int bufferSize) { | |||
this("SearchableSnapshotIndexInput(" + fileInfo.physicalName() + ")", blobContainer, fileInfo, 0L, 0L, fileInfo.length(), | |||
SearchableSnapshotIndexInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I for one welcome our new spotless overlords 🐜
logger.trace(() -> new ParameterizedMessage("writing range [{}-{}] to cache file [{}]", start, end, cacheFileReference)); | ||
|
||
int bytesCopied = 0; | ||
try (IndexInput input = in.openInput(cacheFileReference.getFileName(), ioContext)) { | ||
final long startTimeNanos = currentTimeNanosSupplier.getAsLong(); | ||
try (InputStream input = openInputStream(start, length)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotIndexInput.java
Show resolved
Hide resolved
...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java
Show resolved
Hide resolved
if (SNAPSHOT_CACHE_ENABLED_SETTING.get(indexSettings.getSettings())) { | ||
final Path cacheDir = shardPath.getDataPath().resolve("snapshots").resolve(snapshotId.getUUID()); | ||
directory = new CacheDirectory(directory, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(), | ||
directory = new CacheDirectory(snapshot, blobContainer, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
premature LGTM, there's still a couple of requests open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, LGTM'd it too soon, there's a couple of places where I think we can assert something's not called.
Thanks @DaveCTurner - I've applied your feedback and also pushed a small change in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks David! |
Following #53860, this commit extracts the CacheBufferedIndexInput class from the CacheDirectory so that it can be merged with SearchableSnapshotDirectory.
Today
CacheDirectory
is implemented as aFilterDirectory
that caches files locally while delegating the read operations toSearchableSnapshotDirectory
.This was very useful to separate concerns like caching Lucene files on disk from reading Lucene files from a blob store repository, but it comes with additional complexity:
IndexInput
are buffered within each directory, making it difficult to understand the reading patternCacheDirectory
attempts to directly read N bytes from theIndexInput
but because of theSearchableSnapshotDirectory
read ahead it might in fact download more bytes than neededBlobContainer
This pull request is a first step forward merging
CacheDirectory
intoSearchableSnapshotDirectory
. It changes the cache directory so that it does not rely on the searchable snapshot directory anymore and instead read the bytes directly from theBlobContainer
. It also adds two more base classes that group common class attributes for directories and index inputs.