Skip to content

Merge feature/searchable-snapshots branch into master #54803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 103 commits into from
Apr 6, 2020

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Apr 6, 2020

This pull request merges the searchable-snapshots feature into the master branch.

The feature branch has been regularly updated with latest master changes. The feature is enabled using a feature flag (#54424) and requires a trial or platinum license to work (#54730).

The feature is composed of the following changes that have been reviewed unitary:


3455e0c Handle shard closed during prerecovery (#54790)
104ba40 Trigger explicit loading of blob container and snapshot in the pre-recovery hook (#54729)
676d441 Require license for searchable snapshots (#54730)
66f4433 ILM Initial documentation for searchable snapshots (#54727)
34d3478 ILM MountSnapshotStep: use correct logic operator (#54675)
4ef4204 ILM wait for snapshot creation to complete (#54673)
0a4b26c Apply Spotless formatting (#54644)
6e7a5e0 Fix small reads of multi-part blobs (#54573)
e8cf4f9 Add basic documentation for Searchable Snapshots REST APIs (#54581)
19772b6 Fix merge errors
966b363 ILM don't fail the mount step on ACCEPTED response (#54580)
c6e3185 Add optimized / direct read stats for non-cached files (#54439)
f879268 Add Azure support for ranged read blob operations (#54358)
a5c7bec ILM: add searchable snapshot action (#52585)
773e7e3 Defer repo ops in searchable snapshot restore (#54211)
f8edeee Introduce feature flag for searchable snapshots (#54424)
4db29c0 Rename and move cache classes into store package (#54075)
f82f104 Extract files stored within metadata hash into ByteArrayIndexInputs (#54072)
e20f16e Better skip bytes in FsBlobContainer (#54333)
e669f14 Add file type-based exclusion setting for searchable snapshots cache (#53492)
8f41e61 Exclue searchable-snapshots:qa from Spotless
614843a Exclue searchable-snapshots from Spotless
2b67f16 Fix off by one error in test
897e477 Fix up merge
47ec08d CacheBufferedIndexInput should throw EOFException (#53975)
4e4a705 Fix test after merge
02aa32e Fold CacheDirectory within SearchableSnapshotDirectory (#53917)
815c861 Extract CacheBufferedIndexInput from CacheDirectory (#53879)
8c732d0 Uncouple CacheDirectory from SearchableSnapshotDirectory (#53860)
037a9be Revert workarounds for tryOpenIndex change (#53615)
4bde03a Associate translog with Lucene index commit for searchable snapshots shards (#53459)
215e94d Auto-allocate searchable snapshots (#52527)
95b1acc Move MountSearchableActionRequest and Action to xpack/core (#53394)
88b01a2 Add API to mount a snapshot as a searchable index (#53084)
d713b9e Add Clear Cache API for Searchable Snapshots (#53009)
050583e Allow freezing searchable snapshots (#52653)
ac08315 Fix REST tests after master merge
25b17ae Adapt searchable snapshot to latest master changes
308adaa Account for soft-deletes (#52781)
f1f598e Exclude indices with cache disabled from searchable snapshots stats (#52770)
c049b02 Fix searchable snapshot stats fwd/bwd seeking counters (#52760)
c9ac57f Add stats for time spent fetching data while searching snapshots (#51866)
e620d57 Adapt Searchable Snapshots to latest master changes
d6220c5 Fix up merge
5b115fb Add S3 integration tests for searchable snapshots (#52263)
957a7ae SearchableSnapshotIndexInput should read all bytes (#52199)
c97c97c Adapt searchable snapshots code to latest master changes (#52258)
c73cf68 Add REST API for cache directory stats (#51815)
30b5553 Optimize sequential reads in SearchableSnapshotIndexInput (#51230)
229b953 Add cache directory low-level instrumentation (#51637)
ef18352 Fix compilation issues after merging master
101c419 Use dedicated cache keys instead of relying on an absolute path (#51669)
b88a798 Fix SearchableSnapshotDirectoryTests.testDirectoryReader (#51343)
4c686c1 Add Cache Range Size setting (#51521)
f2a4957 Remove file field from CacheFileReference (#51520)
714c480 Add searchable snapshots cache directory (#50693)
ffc5a82 Wait for concurrent reads in ESIndexInputTestCase (#51425)
e419d91 Add ranged readBlob to S3BlobContainer (#51137)
cd72774 Test concurrent cloning/slicing of IndexInputs (#51032)
aea62f5 Fix corruption simulation in RecoveryFromGatewayIT (#50921)
74c4041 Add SearchableSnapshotRepository (#50239)
0940bcd Add Lucene directory and index input implementations that expose shard snapshot (#49651)
186d813 Add Searchable Snapshots plugin project (#50022)


Relates #50999

tlrx and others added 30 commits December 10, 2019 14:39
This commit adds a new (almost empty) Gradle project for the Searchable 
Snapshots plugin in x-pack.
…d snapshot (#49651)

This commit adds a SearchableSnapshotDirectory implementation 
that exposes the snapshot of a shard as a Lucene Directory.

This directory only supports read operations and does not allow any
 file modification. 

It allows:
* Directory#listAll(): to list all the files stored in the shard snapshot
* Directory#fileLength(String): to return the byte length of a file
* Directory#openInput(String, IOContext): to open an IndexInput

In order to work, the directory requires the list of the shard snapshot 
files and a way to read a specific range of bytes blob. The list of shard
 snapshot files must be provided as a BlobStoreIndexShardSnapshot 
object when the directory is created. This object contains the list of the
 shard files stored in the snapshot and can be used to map each Lucene
 file with its corresponding blob(s) stored in the repository (which can 
be more than one as large Lucene files are split during snapshot).

Blobs are directly read from the snapshot using a BlobContainer.

SearchableSnapshotDirectory provides SearchableSnapshotIndexInput
 to read a file from the snapshot. This index input implementation maintains
 an internal buffer (it extends BufferedIndexInput) and takes care of 
tracking current reading position in the file. Each time more bytes are 
requested to fill the internal buffer, SearchableSnapshotIndexInput maps 
the current position to the appropriate blob name and position in the blob 
to read bytes from. It also propagates the knowledge of the current position
 to any clone or slice.

This commit also adds tests for the SearchableSnapshotDirectory which 
creates a random directory, index documents into it, snapshots the files 
and creates a SearchableSnapshotDirectory from this snapshot. It then 
runs some tests against the normal directory and the searchable snapshot
 directory and compares the results.

Co-Authored-By: David Turner <[email protected]>
This commit adds a repository type that allows the creation of an index backed
by searchable snapshots. These indices are allocated and recovered just like
normal indices, but the underlying `SearchableSnapshotDirectory` makes sure
that no recovery need take place since the correct files all seem to already
exist on the target node.

There are a number of limitations in this implementation:

- like normal indices, after the intial allocation the primary is always
  allocated to a node that previously held an in-sync copy. If the cluster
  loses all copies of a snapshot-backed index then it does not attempt to
  recover.

- peer recoveries of indices containing deletes do not currently work.

- when performing disk-based shard allocation we make no attempt to quantify
  the disk usage of these shards any differently.
Today we simulate a corruption by deleting the `segments_*` file in
RecoveryFromGatewayIT#testStartedShardFoundIfStateNotYetProcessed. However we
do not correctly detect this as a corruption thanks to an outstanding
`NORELEASE` task in the Store, so this test fails.

This commit temporarily adjusts how we corrupt the store in this test to ensure
that it is detected.
We test the behaviour of an `IndexInput` using `randomReadAndSlice()` which
uses a wide varity of different access methods to read the data. Lucene
sometimes calls `clone()` and `slice()` concurrently, although it does ensure
that there are no concurrent readers while these are being called. Today we do
not verify that our `IndexInput`s behave correctly under this kind of
concurrent access.

This commit extracts `randomReadAndSlice()` into a separate test harness for
more general consumption and adds support for concurrent cloning and slicing.
Implements InputStream readBlob(final String blobName, final long position, final int length) on the S3BlobContainer.
Today `ESIndexInputTestCase#randomReadAndSlice` performs some reads on
background threads, but does not necessarily wait for these reads to finish.
It's possible that the underlying `IndexInput` may be closed before these reads
complete, causing them to fail.
This pull request adds a new CacheDirectory implementation which allows 
to cache ranges of bytes of Lucene files. This directory implementation 
uses a CacheService to retrieve and to read ranges of bytes from cached 
files stored on disk. The cached files are created as sparse files on disk 
and the availability of bytes ranges are tracked through a SparseFileTracker 
instance.

Co-authored-by: David Turner <[email protected]>
`CacheFileReference#file` is a path to a file that doesn't exist, for use as a
cache key, but whose parent directory is the location of the actual cache file.
This commit replaces it with the path to the cache directory itself, and
computes the cache key when it is needed.
This commit changes the current cache range size from 32kb to 32mb 
and makes it configurable through a globally defined xpack.searchable.
snapshot.cache.range_size setting. This setting is set to low values in 
unit tests (see #50693) but a bit higher values in integration tests so 
that they don't take too much time to complete.
This commit fixes the SearchableSnapshotDirectoryTests.testDirectoryReader 
which fails on empty/fully deleted segments. The test now iterates on all 
LeafReader to verify that they have the same number of docs. It also check 
all fields within the segment.

Closes #51326
Today cache files are identified in cache using a string representing 
an absolute path to a file on disk. This path is a sub directory of the 
current shard data path and as such already contains identification 
bits like the current index id and the shard id. It also contains the 
snapshot id that is passed at CacheDirectory creation time.

While this has been done for quick prototyping and already been 
improved in #51520, it feels wrong to rely on a path converted to
a string as cache keys. Instead we should have a distinct CacheKey 
object to identify CacheFile in cache.

Relates #50693
This commit is a first step to add instrumentation to the CacheDirectory 
added in #50693. It adds a new mutable IndexInputStats object that allows 
to track various information about how CacheBufferedIndexInput interacts 
with the underlying cache file to satisfy the read operations. It keep tracks 
of small/large forward/backward seekings as well as the total number of 
bytes read from the IndexInput and the total number of bytes read/written 
from/to the CacheFile.

Note that the stats do not reflect the exact usage that Lucene does of the 
IndexInputs opened through a CacheDirectory: IndexInputStats is not aware 
of the read operations that are directly served at a higher level by the 
internal BufferedIndexInput's buffer. Instead it tracks what really hit the disk.
Today `SearchableSnapshotIndexInput` translates each `readBytesInternal` call
to one or more calls to `readBlob` on the underlying repository. We make a lot
of small `readBytesInternal` calls since they are used to fill a small
in-memory buffer. Calls to `readBlob` are expensive: blob storage providers
like AWS S3 charge money per API call.

A common usage pattern is to take a brand-new `IndexInput`, seek to a
particular location, and then sequentially read a substantial amount of data
and stream it to disk.

This commit optimizes the implementation for that specific usage pattern.
Rather than calling `readBlob` each time the internal buffer needs filling we
instead request a (potentially much larger) range of the blob and consume the
response bit-by-bit as needed by a sequentially-reading client.
andreidan and others added 13 commits April 2, 2020 12:37
# Conflicts:
#	plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureBlobStore.java
#	plugins/repository-azure/src/main/java/org/elasticsearch/repositories/azure/AzureStorageService.java
#	plugins/repository-azure/src/test/java/org/elasticsearch/repositories/azure/AzureBlobContainerRetriesTests.java
#	server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java
#	server/src/test/java/org/elasticsearch/gateway/ReplicaShardAllocatorTests.java
#	x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/AsyncRetryDuringSnapshotActionStep.java
#	x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/CopyExecutionStateStep.java
#	x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ShrinkSetAliasStep.java
#	x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/slm/SnapshotLifecyclePolicy.java
#	x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/CopyExecutionStateStepTests.java
This commit adds basic documentation for the searchable snapshots 
REST APIs. The main motivations are to not break downstream 
projects (see #53871) and to provide a simple example of how to 
mount a snapshot.

It adds a new "Searchable Snapshots APIs" sub section in the REST 
APIs section. The "Mount Snapshot API" is the more complete 
documentation and provides an example of how to create a new 
index backed by a snapshot.

Those API are experimental and marked as such. I've not seen any 
mention of the license, except the [testenv="basic"] tags that I copied 
from other doc.
As the snapshot is not taken on the calling thread even though the request
wait_for_completion parameter is true, we can set the flag to true and make
use of the response SnapshotInfo to verify the snapshot was created successfully
or go back to the CleanSnapshotStep and retry creating the snapshot otherwise.
This branching is executed using a new step that wraps an AsyncActionStep
(keeping the wrapped's step key()) and moving to a different next step when
the wrapped action reports an incomplete response.

Co-Authored-By: Lee Hinman <[email protected]>
* ILM MountSnapshotStep: use correct logic operator

* Test the mount response status handling and fix client assertion

* Fix a test edge case where the original index doesn't exist anymore
…covery hook (#54729)

In #53961 we defer the construction of Searchable Snapshot
Directory's BlobContainer and a BlobStoreIndexShardSnapshot
instances so that these objects are created when they are first
accessed, which we expect to be during the recovery process. At
the same time, assertions were added to ensure that the
construction is effectively executed under a generic or snapshot
thread.

Sadly, this is not always the case because there is always a
short window of time between the creation of the IndexShard and
the time the objects are created during the recovery process. It
is also possible that other components of Elasticsearch trigger
the creation of the blob container and snapshot.

We identified the following places:

- computing avg shard size of index shards in
    IndexService.createShard() (this is triggered when relocating
    a primary shard under the cluster state applier thread)
- computing indices stats in TransportIndicesStatsAction which
    calls indexShard.storeStats() which calls
    estimateSizeInBytes() (this is triggered by
    InternalClusterInfoService under the management thread pool)
- computing shard store metadata in
    IndexShard.snapshotStoreMetadata while calls
    failIfCorrupted(Directory) (this is triggered by
    TransportNodesListShardStoreMetadata, executed under the
    fetch_shard_store thread pool)
- TransportNodesListGatewayStartedShards should also use
    failIfCorrupted(Directory) at some point (triggered by the
    GatewayAllocator and executed under the fetch_shard_started
    thread pool)

This commit changes the way BlobContainer and a
BlobStoreIndexShardSnapshot instances are created so that it does
not happen on first access anymore but the objects are now
created using a specific loadSnapshot() method.

This method is explicitly called during the pre-recovery phase.

Until this method is called, the SearchableSnapshotDirectory acts
as if it was empty: the listAll() method returns an empty
array. Having this behavior allows the identified access points
to not fail and not trigger the snapshot loading before we
explicitly load it in the pre-recovery hook.
Today we assert that the shard is in state RECOVERING during prerecovery, but
the recovery may already have been cancelled by a subsequent cluster state
update. This commit adds handling for this cancellation.
@tlrx tlrx added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Apr 6, 2020
@tlrx tlrx requested a review from DaveCTurner April 6, 2020 12:23
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@tlrx tlrx merged commit f6feb6c into master Apr 6, 2020
@tlrx tlrx deleted the feature/searchable-snapshots branch April 6, 2020 13:51
@tlrx
Copy link
Member Author

tlrx commented Apr 6, 2020

Thanks David!

tlrx added a commit that referenced this pull request Apr 7, 2020
This is a backport of #54803 for 7.x.

This pull request cherry picks the squashed commit from #54803 with the additional commits:

    6f50c92 which adjusts master code to 7.x
    a114549 to mute a failing ILM test (#54818)
    48cbca1 and 50186b2 that cleans up and fixes the previous test
    aae12bb that adds a missing feature flag (#54861)
    6f330e3 that adds missing serialization bits (#54864)
    bf72c02 that adjust the version in YAML tests
    a51955f that adds some plumbing for the transport client used in integration tests

Co-authored-by: David Turner <[email protected]>
Co-authored-by: Yannick Welsch <[email protected]>
Co-authored-by: Lee Hinman <[email protected]>
Co-authored-by: Andrei Dan <[email protected]>
tlrx added a commit that referenced this pull request Apr 7, 2020
…54876)

#54803 introduces more QA tests for Azure storage service, but 
they fail the build is one of the key or token is missing. It should i
nstead work like repository-azure:qa tests.
tlrx added a commit that referenced this pull request Apr 7, 2020
…54876)

#54803 introduces more QA tests for Azure storage service, but 
they fail the build is one of the key or token is missing. It should i
nstead work like repository-azure:qa tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants