Merge feature/searchable-snapshots branch into master #54803

tlrx · 2020-04-06T12:23:51Z

This pull request merges the searchable-snapshots feature into the master branch.

The feature branch has been regularly updated with latest master changes. The feature is enabled using a feature flag (#54424) and requires a trial or platinum license to work (#54730).

The feature is composed of the following changes that have been reviewed unitary:

3455e0c Handle shard closed during prerecovery (#54790)
104ba40 Trigger explicit loading of blob container and snapshot in the pre-recovery hook (#54729)
676d441 Require license for searchable snapshots (#54730)
66f4433 ILM Initial documentation for searchable snapshots (#54727)
34d3478 ILM MountSnapshotStep: use correct logic operator (#54675)
4ef4204 ILM wait for snapshot creation to complete (#54673)
0a4b26c Apply Spotless formatting (#54644)
6e7a5e0 Fix small reads of multi-part blobs (#54573)
e8cf4f9 Add basic documentation for Searchable Snapshots REST APIs (#54581)
19772b6 Fix merge errors
966b363 ILM don't fail the mount step on ACCEPTED response (#54580)
c6e3185 Add optimized / direct read stats for non-cached files (#54439)
f879268 Add Azure support for ranged read blob operations (#54358)
a5c7bec ILM: add searchable snapshot action (#52585)
773e7e3 Defer repo ops in searchable snapshot restore (#54211)
f8edeee Introduce feature flag for searchable snapshots (#54424)
4db29c0 Rename and move cache classes into store package (#54075)
f82f104 Extract files stored within metadata hash into ByteArrayIndexInputs (#54072)
e20f16e Better skip bytes in FsBlobContainer (#54333)
e669f14 Add file type-based exclusion setting for searchable snapshots cache (#53492)
8f41e61 Exclue searchable-snapshots:qa from Spotless
614843a Exclue searchable-snapshots from Spotless
2b67f16 Fix off by one error in test
897e477 Fix up merge
47ec08d CacheBufferedIndexInput should throw EOFException (#53975)
4e4a705 Fix test after merge
02aa32e Fold CacheDirectory within SearchableSnapshotDirectory (#53917)
815c861 Extract CacheBufferedIndexInput from CacheDirectory (#53879)
8c732d0 Uncouple CacheDirectory from SearchableSnapshotDirectory (#53860)
037a9be Revert workarounds for tryOpenIndex change (#53615)
4bde03a Associate translog with Lucene index commit for searchable snapshots shards (#53459)
215e94d Auto-allocate searchable snapshots (#52527)
95b1acc Move MountSearchableActionRequest and Action to xpack/core (#53394)
88b01a2 Add API to mount a snapshot as a searchable index (#53084)
d713b9e Add Clear Cache API for Searchable Snapshots (#53009)
050583e Allow freezing searchable snapshots (#52653)
ac08315 Fix REST tests after master merge
25b17ae Adapt searchable snapshot to latest master changes
308adaa Account for soft-deletes (#52781)
f1f598e Exclude indices with cache disabled from searchable snapshots stats (#52770)
c049b02 Fix searchable snapshot stats fwd/bwd seeking counters (#52760)
c9ac57f Add stats for time spent fetching data while searching snapshots (#51866)
e620d57 Adapt Searchable Snapshots to latest master changes
d6220c5 Fix up merge
5b115fb Add S3 integration tests for searchable snapshots (#52263)
957a7ae SearchableSnapshotIndexInput should read all bytes (#52199)
c97c97c Adapt searchable snapshots code to latest master changes (#52258)
c73cf68 Add REST API for cache directory stats (#51815)
30b5553 Optimize sequential reads in SearchableSnapshotIndexInput (#51230)
229b953 Add cache directory low-level instrumentation (#51637)
ef18352 Fix compilation issues after merging master
101c419 Use dedicated cache keys instead of relying on an absolute path (#51669)
b88a798 Fix SearchableSnapshotDirectoryTests.testDirectoryReader (#51343)
4c686c1 Add Cache Range Size setting (#51521)
f2a4957 Remove file field from CacheFileReference (#51520)
714c480 Add searchable snapshots cache directory (#50693)
ffc5a82 Wait for concurrent reads in ESIndexInputTestCase (#51425)
e419d91 Add ranged readBlob to S3BlobContainer (#51137)
cd72774 Test concurrent cloning/slicing of IndexInputs (#51032)
aea62f5 Fix corruption simulation in RecoveryFromGatewayIT (#50921)
74c4041 Add SearchableSnapshotRepository (#50239)
0940bcd Add Lucene directory and index input implementations that expose shard snapshot (#49651)
186d813 Add Searchable Snapshots plugin project (#50022)

Relates #50999

This commit adds a new (almost empty) Gradle project for the Searchable Snapshots plugin in x-pack.

…d snapshot (#49651) This commit adds a SearchableSnapshotDirectory implementation that exposes the snapshot of a shard as a Lucene Directory. This directory only supports read operations and does not allow any file modification. It allows: * Directory#listAll(): to list all the files stored in the shard snapshot * Directory#fileLength(String): to return the byte length of a file * Directory#openInput(String, IOContext): to open an IndexInput In order to work, the directory requires the list of the shard snapshot files and a way to read a specific range of bytes blob. The list of shard snapshot files must be provided as a BlobStoreIndexShardSnapshot object when the directory is created. This object contains the list of the shard files stored in the snapshot and can be used to map each Lucene file with its corresponding blob(s) stored in the repository (which can be more than one as large Lucene files are split during snapshot). Blobs are directly read from the snapshot using a BlobContainer. SearchableSnapshotDirectory provides SearchableSnapshotIndexInput to read a file from the snapshot. This index input implementation maintains an internal buffer (it extends BufferedIndexInput) and takes care of tracking current reading position in the file. Each time more bytes are requested to fill the internal buffer, SearchableSnapshotIndexInput maps the current position to the appropriate blob name and position in the blob to read bytes from. It also propagates the knowledge of the current position to any clone or slice. This commit also adds tests for the SearchableSnapshotDirectory which creates a random directory, index documents into it, snapshots the files and creates a SearchableSnapshotDirectory from this snapshot. It then runs some tests against the normal directory and the searchable snapshot directory and compares the results. Co-Authored-By: David Turner <[email protected]>

This commit adds a repository type that allows the creation of an index backed by searchable snapshots. These indices are allocated and recovered just like normal indices, but the underlying `SearchableSnapshotDirectory` makes sure that no recovery need take place since the correct files all seem to already exist on the target node. There are a number of limitations in this implementation: - like normal indices, after the intial allocation the primary is always allocated to a node that previously held an in-sync copy. If the cluster loses all copies of a snapshot-backed index then it does not attempt to recover. - peer recoveries of indices containing deletes do not currently work. - when performing disk-based shard allocation we make no attempt to quantify the disk usage of these shards any differently.

Today we simulate a corruption by deleting the `segments_*` file in RecoveryFromGatewayIT#testStartedShardFoundIfStateNotYetProcessed. However we do not correctly detect this as a corruption thanks to an outstanding `NORELEASE` task in the Store, so this test fails. This commit temporarily adjusts how we corrupt the store in this test to ensure that it is detected.

We test the behaviour of an `IndexInput` using `randomReadAndSlice()` which uses a wide varity of different access methods to read the data. Lucene sometimes calls `clone()` and `slice()` concurrently, although it does ensure that there are no concurrent readers while these are being called. Today we do not verify that our `IndexInput`s behave correctly under this kind of concurrent access. This commit extracts `randomReadAndSlice()` into a separate test harness for more general consumption and adds support for concurrent cloning and slicing.

Implements InputStream readBlob(final String blobName, final long position, final int length) on the S3BlobContainer.

Today `ESIndexInputTestCase#randomReadAndSlice` performs some reads on background threads, but does not necessarily wait for these reads to finish. It's possible that the underlying `IndexInput` may be closed before these reads complete, causing them to fail.

This pull request adds a new CacheDirectory implementation which allows to cache ranges of bytes of Lucene files. This directory implementation uses a CacheService to retrieve and to read ranges of bytes from cached files stored on disk. The cached files are created as sparse files on disk and the availability of bytes ranges are tracked through a SparseFileTracker instance. Co-authored-by: David Turner <[email protected]>

`CacheFileReference#file` is a path to a file that doesn't exist, for use as a cache key, but whose parent directory is the location of the actual cache file. This commit replaces it with the path to the cache directory itself, and computes the cache key when it is needed.

This commit changes the current cache range size from 32kb to 32mb and makes it configurable through a globally defined xpack.searchable. snapshot.cache.range_size setting. This setting is set to low values in unit tests (see #50693) but a bit higher values in integration tests so that they don't take too much time to complete.

This commit fixes the SearchableSnapshotDirectoryTests.testDirectoryReader which fails on empty/fully deleted segments. The test now iterates on all LeafReader to verify that they have the same number of docs. It also check all fields within the segment. Closes #51326

Today cache files are identified in cache using a string representing an absolute path to a file on disk. This path is a sub directory of the current shard data path and as such already contains identification bits like the current index id and the shard id. It also contains the snapshot id that is passed at CacheDirectory creation time. While this has been done for quick prototyping and already been improved in #51520, it feels wrong to rely on a path converted to a string as cache keys. Instead we should have a distinct CacheKey object to identify CacheFile in cache. Relates #50693

This commit is a first step to add instrumentation to the CacheDirectory added in #50693. It adds a new mutable IndexInputStats object that allows to track various information about how CacheBufferedIndexInput interacts with the underlying cache file to satisfy the read operations. It keep tracks of small/large forward/backward seekings as well as the total number of bytes read from the IndexInput and the total number of bytes read/written from/to the CacheFile. Note that the stats do not reflect the exact usage that Lucene does of the IndexInputs opened through a CacheDirectory: IndexInputStats is not aware of the read operations that are directly served at a higher level by the internal BufferedIndexInput's buffer. Instead it tracks what really hit the disk.

Today `SearchableSnapshotIndexInput` translates each `readBytesInternal` call to one or more calls to `readBlob` on the underlying repository. We make a lot of small `readBytesInternal` calls since they are used to fill a small in-memory buffer. Calls to `readBlob` are expensive: blob storage providers like AWS S3 charge money per API call. A common usage pattern is to take a brand-new `IndexInput`, seek to a particular location, and then sequentially read a substantial amount of data and stream it to disk. This commit optimizes the implementation for that specific usage pattern. Rather than calling `readBlob` each time the internal buffer needs filling we instead request a (potentially much larger) range of the blob and consume the response bit-by-bit as needed by a sequentially-reading client.

# Conflicts: # plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureBlobStore.java # plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureStorageService.java # plugins/repository-azure/src/test/java/org/elasticsearch/repositories/azure/AzureBlobContainerRetriesTests.java # server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java # server/src/test/java/org/elasticsearch/gateway/ReplicaShardAllocatorTests.java # x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/AsyncRetryDuringSnapshotActionStep.java # x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/CopyExecutionStateStep.java # x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ShrinkSetAliasStep.java # x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotLifecyclePolicy.java # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/CopyExecutionStateStepTests.java

This commit adds basic documentation for the searchable snapshots REST APIs. The main motivations are to not break downstream projects (see #53871) and to provide a simple example of how to mount a snapshot. It adds a new "Searchable Snapshots APIs" sub section in the REST APIs section. The "Mount Snapshot API" is the more complete documentation and provides an example of how to create a new index backed by a snapshot. Those API are experimental and marked as such. I've not seen any mention of the license, except the [testenv="basic"] tags that I copied from other doc.

As the snapshot is not taken on the calling thread even though the request wait_for_completion parameter is true, we can set the flag to true and make use of the response SnapshotInfo to verify the snapshot was created successfully or go back to the CleanSnapshotStep and retry creating the snapshot otherwise. This branching is executed using a new step that wraps an AsyncActionStep (keeping the wrapped's step key()) and moving to a different next step when the wrapped action reports an incomplete response. Co-Authored-By: Lee Hinman <[email protected]>

* ILM MountSnapshotStep: use correct logic operator * Test the mount response status handling and fix client assertion * Fix a test edge case where the original index doesn't exist anymore

…covery hook (#54729) In #53961 we defer the construction of Searchable Snapshot Directory's BlobContainer and a BlobStoreIndexShardSnapshot instances so that these objects are created when they are first accessed, which we expect to be during the recovery process. At the same time, assertions were added to ensure that the construction is effectively executed under a generic or snapshot thread. Sadly, this is not always the case because there is always a short window of time between the creation of the IndexShard and the time the objects are created during the recovery process. It is also possible that other components of Elasticsearch trigger the creation of the blob container and snapshot. We identified the following places: - computing avg shard size of index shards in IndexService.createShard() (this is triggered when relocating a primary shard under the cluster state applier thread) - computing indices stats in TransportIndicesStatsAction which calls indexShard.storeStats() which calls estimateSizeInBytes() (this is triggered by InternalClusterInfoService under the management thread pool) - computing shard store metadata in IndexShard.snapshotStoreMetadata while calls failIfCorrupted(Directory) (this is triggered by TransportNodesListShardStoreMetadata, executed under the fetch_shard_store thread pool) - TransportNodesListGatewayStartedShards should also use failIfCorrupted(Directory) at some point (triggered by the GatewayAllocator and executed under the fetch_shard_started thread pool) This commit changes the way BlobContainer and a BlobStoreIndexShardSnapshot instances are created so that it does not happen on first access anymore but the objects are now created using a specific loadSnapshot() method. This method is explicitly called during the pre-recovery phase. Until this method is called, the SearchableSnapshotDirectory acts as if it was empty: the listAll() method returns an empty array. Having this behavior allows the identified access points to not fail and not trigger the snapshot loading before we explicitly load it in the pre-recovery hook.

Today we assert that the shard is in state RECOVERING during prerecovery, but the recovery may already have been cancelled by a subsequent cluster state update. This commit adds handling for this cancellation.

elasticmachine · 2020-04-06T12:23:55Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

DaveCTurner

LGTM 🎉

tlrx · 2020-04-06T13:52:55Z

Thanks David!

This is a backport of #54803 for 7.x. This pull request cherry picks the squashed commit from #54803 with the additional commits: 6f50c92 which adjusts master code to 7.x a114549 to mute a failing ILM test (#54818) 48cbca1 and 50186b2 that cleans up and fixes the previous test aae12bb that adds a missing feature flag (#54861) 6f330e3 that adds missing serialization bits (#54864) bf72c02 that adjust the version in YAML tests a51955f that adds some plumbing for the transport client used in integration tests Co-authored-by: David Turner <[email protected]> Co-authored-by: Yannick Welsch <[email protected]> Co-authored-by: Lee Hinman <[email protected]> Co-authored-by: Andrei Dan <[email protected]>

…54876) #54803 introduces more QA tests for Azure storage service, but they fail the build is one of the key or token is missing. It should i nstead work like repository-azure:qa tests.

tlrx and others added 30 commits December 10, 2019 14:39

Add Searchable Snapshots plugin project (#50022)

186d813

This commit adds a new (almost empty) Gradle project for the Searchable Snapshots plugin in x-pack.

Merge branch 'master' into feature/searchable-snapshots

61e6029

Merge branch 'master' into feature/searchable-snapshots

b5ae5ba

Merge branch 'master' into feature/searchable-snapshots

287019a

Merge branch 'master' into feature/searchable-snapshots

8cfa8b6

Merge branch 'master' into feature/searchable-snapshots

00b7647

Merge branch 'master' into feature/searchable-snapshots

f6adcc6

Merge branch 'master' into feature/searchable-snapshots

f613ad7

Merge branch 'master' into feature/searchable-snapshots

f3649c4

Merge branch 'master' into feature/searchable-snapshots

115997b

Merge branch 'master' into feature/searchable-snapshots

4ff6c79

Add ranged readBlob to S3BlobContainer (#51137)

e419d91

Implements InputStream readBlob(final String blobName, final long position, final int length) on the S3BlobContainer.

Merge branch 'master' into feature/searchable-snapshots

85af782

Merge branch 'master' into feature/searchable-snapshots

ec4b5bf

Merge branch 'master' into feature/searchable-snapshots

e23974b

Fix compilation issues after merging master

ef18352

Merge branch 'master' into feature/searchable-snapshots

f1b0991

Merge branch 'master' into feature/searchable-snapshots

a990dd7

andreidan and others added 13 commits April 2, 2020 12:37

Fix merge errors

19772b6

Fix small reads of multi-part blobs (#54573)

6e7a5e0

Apply Spotless formatting (#54644)

0a4b26c

Merge branch 'master' into feature/searchable-snapshots

64a2813

ILM MountSnapshotStep: use correct logic operator (#54675)

34d3478

* ILM MountSnapshotStep: use correct logic operator * Test the mount response status handling and fix client assertion * Fix a test edge case where the original index doesn't exist anymore

ILM Initial documentation for searchable snapshots (#54727)

66f4433

Merge branch 'master' into feature/searchable-snapshots

65e57e6

Require license for searchable snapshots (#54730)

676d441

Handle shard closed during prerecovery (#54790)

3455e0c

Today we assert that the shard is in state RECOVERING during prerecovery, but the recovery may already have been cancelled by a subsequent cluster state update. This commit adds handling for this cancellation.

tlrx added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Apr 6, 2020

tlrx requested a review from DaveCTurner April 6, 2020 12:23

DaveCTurner approved these changes Apr 6, 2020

View reviewed changes

tlrx merged commit f6feb6c into master Apr 6, 2020

tlrx deleted the feature/searchable-snapshots branch April 6, 2020 13:51

tlrx mentioned this pull request Apr 6, 2020

Merge feature/searchable-snapshots branch into 7.x (#54803) #54825

Merged

tlrx added the backport pending label Apr 6, 2020

tlrx mentioned this pull request Apr 7, 2020

Only one of azure key or token can be specified in 3rd party tests #54876

Merged

tlrx removed the backport pending label Apr 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge feature/searchable-snapshots branch into master #54803

Merge feature/searchable-snapshots branch into master #54803

Uh oh!

tlrx commented Apr 6, 2020

Uh oh!

elasticmachine commented Apr 6, 2020

Uh oh!

DaveCTurner left a comment

Uh oh!

tlrx commented Apr 6, 2020

Uh oh!

Uh oh!

Merge feature/searchable-snapshots branch into master #54803

Merge feature/searchable-snapshots branch into master #54803

Uh oh!

Conversation

tlrx commented Apr 6, 2020

Uh oh!

elasticmachine commented Apr 6, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

tlrx commented Apr 6, 2020

Uh oh!

Uh oh!