Add optimized / direct read stats for non-cached files #54439

tlrx · 2020-03-30T15:51:03Z

This pull request adds support for tracking stats about optimized (ie read ahead) and non optimized (ie direct) read operations executed for non cached Lucene files of searchable snapshot directories.

elasticmachine · 2020-03-30T15:51:05Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

tlrx · 2020-03-30T15:51:48Z

...main/java/org/elasticsearch/xpack/core/searchablesnapshots/SearchableSnapshotShardStats.java

            this.fileName = fileName;
            this.fileLength = fileLength;
            this.openCount = openCount;
-            this.innerCount = innerCount;


The "inner open count" information is obsolete since #53860 and has been removed

tlrx · 2020-03-30T15:52:45Z

...main/java/org/elasticsearch/xpack/core/searchablesnapshots/SearchableSnapshotShardStats.java

@@ -158,13 +156,13 @@ public CacheIndexInputStats(String fileName, long fileLength, long openCount, lo
            this.cachedBytesRead = cachedBytesRead;
            this.cachedBytesWritten = cachedBytesWritten;
            this.directBytesRead = directBytesRead;
+            this.optimizedBytesRead = optimizedBytesRead;


This new timed counter is added to track information about optimized read operations executed in DirectBlobContainerIndexInput

tlrx · 2020-03-30T15:53:53Z

...-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotIndexInput.java

    protected final BlobContainer blobContainer;
    protected final FileInfo fileInfo;
    protected final IOContext context;
+    protected final IndexInputStats stats;
+    protected final long offset;
+    protected final long length;


offset and length are common attributes and that's why they are added to BaseSearchableSnapshotIndexInput (sorry for the extra noise in ctors)

tlrx · 2020-03-30T15:55:18Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

@@ -244,7 +244,7 @@ public IndexInput openInput(final String name, final IOContext context) throws I
        if (useCache && isExcludedFromCache(name) == false) {
            return new CachedBlobContainerIndexInput(this, fileInfo, context, inputStats);
        } else {
-            return new DirectBlobContainerIndexInput(blobContainer, fileInfo, context, uncachedChunkSize, BufferedIndexInput.BUFFER_SIZE);
+            return new DirectBlobContainerIndexInput(this, fileInfo, context, inputStats, uncachedChunkSize, bufferSize(context));


I noticed this while writing tests: we don't align the buffer size in the same way than BufferedIndexInput would do (ie, based on the IOContext)

Good catch 👍

tlrx · 2020-03-30T15:55:46Z

...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java

-            this.offset + offset, length, true, cacheFileReference);
+        final CachedBlobContainerIndexInput slice = new CachedBlobContainerIndexInput(getFullSliceDescription(sliceDescription), directory,
+            fileInfo, context, stats, this.offset + offset, length, cacheFileReference);
+        slice.isClone = true;


This is not necessary but I added it for consistency

tlrx · 2020-03-30T15:56:20Z

...pshots/src/main/java/org/elasticsearch/index/store/direct/DirectBlobContainerIndexInput.java

@@ -118,12 +119,15 @@ private void readInternalBytes(final int part, long pos, final byte[] b, int off

        if (optimizedReadSize < length) {
            // we did not read everything in an optimized fashion, so read the remainder directly
+            final long startTimeNanos = directory.statsCurrentTimeNanos();


This is were "direct_read_bytes" are captured for direct index inputs.

tlrx · 2020-03-30T15:57:48Z

...pshots/src/main/java/org/elasticsearch/index/store/direct/DirectBlobContainerIndexInput.java

        // if we open a stream of length streamLength then it will not be completely consumed by this read, so it is worthwhile to open
        // it and keep it open for future reads
        final InputStream inputStream = openBlobStream(part, pos, streamLength);
-        streamForSequentialReads = new StreamForSequentialReads(inputStream, part, pos, streamLength);
+        streamForSequentialReads = new StreamForSequentialReads(new FilterInputStream(inputStream) {


This is were "optimized_read_bytes" are captured for direct index inputs. Note that they are only added to stats when the stream is closed. This is useful in case the inner input stream reads all remaining bytes on closing (though there is no test for this).

DaveCTurner

I left a couple of suggestions. Thanks for the commentary.

DaveCTurner · 2020-03-31T10:47:33Z

...s/src/test/java/org/elasticsearch/index/store/direct/DirectBlobContainerIndexInputTests.java

@@ -94,7 +96,9 @@ public int read(byte[] b, int off, int len) throws IOException {
                    };
                }
            });
-        return new DirectBlobContainerIndexInput(blobContainer, fileInfo, newIOContext(random()), minimumReadSize,
+        final SearchableSnapshotDirectory directory = mock(SearchableSnapshotDirectory.class);


As far as I can see, we pass the whole SearchableSnapshotDirectory into the IndexInput so that the IndexInput has access to the current time for stats purposes, leading to this mocking here -- otherwise we could continue to pass in just the BlobContainer. I think it would be neater to keep passing the BlobContainer and either pass in a LongSupplier for the current time, or else attach the current time supplier to the IndexInputStats.

attach the current time supplier to the IndexInputStats

I like this one, so I pushed 47d7e4c

DaveCTurner · 2020-03-31T10:50:24Z

...pshots/src/main/java/org/elasticsearch/index/store/direct/DirectBlobContainerIndexInput.java

+            @Override
+            public void close() throws IOException {
+                super.close();
+                stats.addOptimizedBytesRead(bytesRead, directory.statsCurrentTimeNanos() - startTimeNanos);


This also includes the time spent waiting for sequential reads to be requested by the caller. What do you think about timing each read call instead and adding all those timings up?

You're right. I pushed a940930 which accumulates numbers for each read operation before adding the sums when the stream is closed. This way the stat will reflect a single read operation that consumed the total bytes of the stream with a more accurate timing.

I adapted the test.

DaveCTurner · 2020-03-31T10:50:46Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

@@ -244,7 +244,7 @@ public IndexInput openInput(final String name, final IOContext context) throws I
        if (useCache && isExcludedFromCache(name) == false) {
            return new CachedBlobContainerIndexInput(this, fileInfo, context, inputStats);
        } else {
-            return new DirectBlobContainerIndexInput(blobContainer, fileInfo, context, uncachedChunkSize, BufferedIndexInput.BUFFER_SIZE);
+            return new DirectBlobContainerIndexInput(this, fileInfo, context, inputStats, uncachedChunkSize, bufferSize(context));


Good catch 👍

tlrx · 2020-03-31T14:50:34Z

Thanks @DaveCTurner for your feedback. I addressed your comments.

DaveCTurner

LGTM

tlrx · 2020-03-31T15:19:05Z

@elasticmachine update branch

…-read-stats

tlrx · 2020-04-01T12:17:39Z

Running tests again after a CI failure that should be fixed by #54573

@elasticmachine run elasticsearch-ci/2

tlrx · 2020-04-01T14:07:28Z

Thanks David! I had to adapt the SearchableSnapshotsIntegTests in ea80078 and 689eb8c now stats are also exposed when the cache is enabled or when a file is explicitely uncached.

Add optimized / direct read stats for non-cached files

1263b07

tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Mar 30, 2020

tlrx requested a review from DaveCTurner March 30, 2020 15:51

tlrx commented Mar 30, 2020

View reviewed changes

DaveCTurner reviewed Mar 31, 2020

View reviewed changes

tlrx added 2 commits March 31, 2020 15:48

more accurate read time

a940930

Revert blob container

47d7e4c

tlrx requested a review from DaveCTurner March 31, 2020 14:50

DaveCTurner approved these changes Mar 31, 2020

View reviewed changes

elasticmachine and others added 5 commits March 31, 2020 11:19

Merge branch 'feature/searchable-snapshots' into optmized-read-stats

1029842

Merge branch 'feature/searchable-snapshots' into optmized-read-stats

9b77eae

Adapt test and expose stats

ea80078

Merge remote-tracking branch 'tlrx/optmized-read-stats' into optmized…

9c7a0a5

…-read-stats

Only test started shard

689eb8c

tlrx merged commit c6e3185 into elastic:feature/searchable-snapshots Apr 1, 2020

tlrx deleted the optmized-read-stats branch April 1, 2020 14:05

tlrx mentioned this pull request Apr 6, 2020

Merge feature/searchable-snapshots branch into master #54803

Merged

Add optimized / direct read stats for non-cached files #54439

Add optimized / direct read stats for non-cached files #54439

Uh oh!

Conversation

tlrx commented Mar 30, 2020

Uh oh!

elasticmachine commented Mar 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented Mar 31, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

tlrx commented Mar 31, 2020

Uh oh!

tlrx commented Apr 1, 2020

Uh oh!

tlrx commented Apr 1, 2020

Uh oh!

Uh oh!