Skip to content

Uncouple CacheDirectory from SearchableSnapshotDirectory #53860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Mar 20, 2020

Today CacheDirectory is implemented as a FilterDirectory that caches files locally while delegating the read operations to SearchableSnapshotDirectory.

This was very useful to separate concerns like caching Lucene files on disk from reading Lucene files from a blob store repository, but it comes with additional complexity:

  • IndexInput are buffered within each directory, making it difficult to understand the reading pattern
  • in case of cache evictions, the CacheDirectory attempts to directly read N bytes from the IndexInput but because of the SearchableSnapshotDirectory read ahead it might in fact download more bytes than needed
  • the direct read bytes stat does not exactly reflect the bytes read by the underlying BlobContainer
  • changes like Add file type-based exclusion setting for searchable snapshots cache #53492 are a bit more complex than necessary

This pull request is a first step forward merging CacheDirectory into SearchableSnapshotDirectory. It changes the cache directory so that it does not rely on the searchable snapshot directory anymore and instead read the bytes directly from the BlobContainer. It also adds two more base classes that group common class attributes for directories and index inputs.

@tlrx tlrx requested a review from DaveCTurner March 20, 2020 12:04
import java.util.Set;
import java.util.concurrent.atomic.AtomicBoolean;

public abstract class BaseSearchableSnapshotDirectory extends BaseDirectory {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This abstract class has been introduced in order to contain common attributes of existing directories and in order to unify the existing constructors. It should become a concrete SearchableSnapshotDirectory when the cache logic and the searchable snapshot logic will be merged together.

if (SNAPSHOT_CACHE_ENABLED_SETTING.get(indexSettings.getSettings())) {
final Path cacheDir = shardPath.getDataPath().resolve("snapshots").resolve(snapshotId.getUUID());
directory = new CacheDirectory(directory, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(),
directory = new CacheDirectory(snapshot, blobContainer, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this pull request, CacheDirectory reads blobs using the BlobContainer and does not need to wrap the SearchableSnapshotDirectory anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -61,17 +60,21 @@
private static final long NO_SEQUENTIAL_READ_OPTIMIZATION = 0L;


SearchableSnapshotIndexInput(final BlobContainer blobContainer, final FileInfo fileInfo, long sequentialReadSize, int bufferSize) {
this("SearchableSnapshotIndexInput(" + fileInfo.physicalName() + ")", blobContainer, fileInfo, 0L, 0L, fileInfo.length(),
SearchableSnapshotIndexInput(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this spotless formatting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I for one welcome our new spotless overlords 🐜

@@ -39,7 +43,7 @@
/**
* {@link CacheDirectory} uses a {@link CacheService} to cache Lucene files provided by another {@link Directory}.
*/
public class CacheDirectory extends FilterDirectory {
public class CacheDirectory extends BaseSearchableSnapshotDirectory {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main purpose of this pull request :)

logger.trace(() -> new ParameterizedMessage("writing range [{}-{}] to cache file [{}]", start, end, cacheFileReference));

int bytesCopied = 0;
try (IndexInput input = in.openInput(cacheFileReference.getFileName(), ioContext)) {
final long startTimeNanos = currentTimeNanosSupplier.getAsLong();
try (InputStream input = openInputStream(start, length)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an important change here: by reading the range to cache on disk using the BlobContainer only the required bytes are downloaded in a (usually) single request (unless the range spans over multiple blob parts) and there is no read ahead engaged here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tlrx tlrx added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement labels Mar 20, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@tlrx
Copy link
Member Author

tlrx commented Mar 20, 2020

@elasticmachine update branch

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @tlrx, but I think we can go further. I think we don't need to run things with the cache completely disabled any more, so we can drop the index.store.snapshot.cache.enabled setting, which means SearchableSnapshotDirectory is no longer needed, which means each base class has just one concrete implementation and so they can be merged together.

I'm ok with reviewing all that in one PR, I think, especially if the steps are mostly in separate commits. WDYT?

@@ -61,17 +60,21 @@
private static final long NO_SEQUENTIAL_READ_OPTIMIZATION = 0L;


SearchableSnapshotIndexInput(final BlobContainer blobContainer, final FileInfo fileInfo, long sequentialReadSize, int bufferSize) {
this("SearchableSnapshotIndexInput(" + fileInfo.physicalName() + ")", blobContainer, fileInfo, 0L, 0L, fileInfo.length(),
SearchableSnapshotIndexInput(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I for one welcome our new spotless overlords 🐜

logger.trace(() -> new ParameterizedMessage("writing range [{}-{}] to cache file [{}]", start, end, cacheFileReference));

int bytesCopied = 0;
try (IndexInput input = in.openInput(cacheFileReference.getFileName(), ioContext)) {
final long startTimeNanos = currentTimeNanosSupplier.getAsLong();
try (InputStream input = openInputStream(start, length)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

DaveCTurner
DaveCTurner previously approved these changes Mar 20, 2020
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (SNAPSHOT_CACHE_ENABLED_SETTING.get(indexSettings.getSettings())) {
final Path cacheDir = shardPath.getDataPath().resolve("snapshots").resolve(snapshotId.getUUID());
directory = new CacheDirectory(directory, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(),
directory = new CacheDirectory(snapshot, blobContainer, cache, cacheDir, snapshotId, indexId, shardPath.getShardId(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@DaveCTurner DaveCTurner dismissed their stale review March 20, 2020 13:46

premature LGTM, there's still a couple of requests open

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, LGTM'd it too soon, there's a couple of places where I think we can assert something's not called.

@tlrx
Copy link
Member Author

tlrx commented Mar 20, 2020

Thanks @DaveCTurner - I've applied your feedback and also pushed a small change in BlobContainerWrapper that didn't make it in the previous commit.

@tlrx tlrx requested a review from DaveCTurner March 20, 2020 13:54
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx merged commit 8c732d0 into elastic:feature/searchable-snapshots Mar 20, 2020
@tlrx tlrx deleted the cache-directory-use-blob-container branch March 20, 2020 14:58
@tlrx
Copy link
Member Author

tlrx commented Mar 20, 2020

Thanks David!

tlrx added a commit that referenced this pull request Mar 20, 2020
Following #53860, this commit extracts the CacheBufferedIndexInput class 
from the CacheDirectory so that it can be merged with SearchableSnapshotDirectory.
tlrx added a commit that referenced this pull request Mar 24, 2020
Following #53860 and #53879, this commit now merges 
CacheDirectory into SearchableSnapshotDirectory.
tlrx added a commit that referenced this pull request Mar 27, 2020
…54072)

We can now detect blobs that are stored within the metadata hash field and 
open them directly as ByteArrayIndexInput.

Relates #53860
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants