-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Additional docs for shared_cache searchable snapshots #70566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
a2b8e6d
e058f65
72cd042
4d5d6a0
c2b0c32
a0eae75
13dd749
0977904
d75c531
015f29f
3e677e6
f9a528e
2756df9
2c37816
8f00bd1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,51 @@ | ||
[[searchable-snapshots]] | ||
== {search-snaps-cap} | ||
|
||
{search-snaps-cap} let you reduce your operating costs by using | ||
<<snapshot-restore, snapshots>> for resiliency rather than maintaining | ||
<<scalability,replica shards>> within a cluster. When you mount an index from a | ||
snapshot as a {search-snap}, {es} copies the index shards to local storage | ||
within the cluster. This ensures that search performance is comparable to | ||
searching any other index, and minimizes the need to access the snapshot | ||
repository. Should a node fail, shards of a {search-snap} index are | ||
automatically recovered from the snapshot repository. | ||
|
||
This can result in significant cost savings for less frequently searched data. | ||
With {search-snaps}, you no longer need an extra index shard copy to avoid data | ||
loss, potentially halving the node local storage capacity necessary for | ||
searching that data. Because {search-snaps} rely on the same snapshot mechanism | ||
you use for backups, they have a minimal impact on your snapshot repository | ||
storage costs. | ||
{search-snaps-cap} let you use <<snapshot-restore,snapshots>> to search and | ||
store infrequently-accessed, read-only data. The <<cold-tier,cold>> and | ||
<<frozen-tier,frozen>> data tiers use {search-snaps} to reduce your storage and | ||
operating costs. | ||
|
||
{search-snaps-cap} eliminate the need for <<scalability,replica shards>>, | ||
potentially halving the local storage needed to search your data. | ||
{search-snaps-cap} rely on the same snapshot mechanism you already use for | ||
backups and have minimal impact on your snapshot repository storage costs. | ||
|
||
[discrete] | ||
[[mount-options]] | ||
=== Mount options | ||
|
||
To search a snapshot, you must first mount it locally as an index. There are two | ||
mounting options, each with a different local storage footprint: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we link to the mount API here? |
||
|
||
[[full-copy]] | ||
Full copy:: | ||
Loads a full, local copy of the snapshotted index's shards into the cluster. The | ||
cold tier uses this option by default. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the hot tier also use this option by default? I was wondering what was used for searchable snapshots created during the hot phase. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Saying "The cold tier uses this option by default." is a bit tricky, as it might be unclear to a user here what this means (How is it determined whether the API is called "in the cold tier" or "in the frozen tier"? The more technically precise description is that ILM's searchable snapshot action (that internally calls the the mount API) uses "full copy" in the hot/cold phase, and uses "shared cache" in the frozen phase. ILM also links the phases to the tiers (but also only very implicitly, which we can perhaps gloss over here). |
||
+ | ||
If a node fails, {es} automatically recovers the index's shards from the | ||
snapshot repository. Search performance for a full copy is comparable to a | ||
regular index, with minimal need to access the snapshot repository. | ||
|
||
[[shared-cache]] | ||
Shared cache:: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. needs to be marked as experimental |
||
Uses a local cache containing frequently searched parts of the snapshotted | ||
index's data. The frozen tier uses this option by default. | ||
+ | ||
If a search requires data that's not in the cache, {es} lazily fetches the data | ||
as needed from the snapshot repository. Searches that require these fetches are | ||
slower, but repeated searches are served more quickly from the cache. | ||
+ | ||
Although slower than a full local copy or a regular index, the shared snapshot | ||
cache still returns results quickly, even for large data sets. This option | ||
decouples compute and storage, letting you run searches with minimal compute | ||
resources. | ||
|
||
[discrete] | ||
[[using-searchable-snapshots]] | ||
=== Using {search-snaps} | ||
|
||
Searching a {search-snap} index is the same as searching any other index. | ||
Search performance is comparable to regular indices because the shard data is | ||
copied onto nodes in the cluster when the {search-snap} is mounted. | ||
|
||
By default, {search-snap} indices have no replicas. The underlying snapshot | ||
provides resilience and the query volume is expected to be low enough that a | ||
|
@@ -39,10 +61,10 @@ nodes. | |
|
||
You typically manage {search-snaps} through {ilm-init}. The | ||
<<ilm-searchable-snapshot, searchable snapshots>> action automatically converts | ||
a regular index into a {search-snap} index when it reaches the `cold` phase. | ||
You can also make indices in existing snapshots searchable by manually mounting | ||
them as {search-snap} indices with the | ||
<<searchable-snapshots-api-mount-snapshot, mount snapshot>> API. | ||
a regular index into a {search-snap} index when it reaches the `cold` or | ||
`frozen` phase. You can also make indices in existing snapshots searchable by | ||
manually mounting them using the <<searchable-snapshots-api-mount-snapshot, | ||
mount snapshot>> API. | ||
|
||
To mount an index from a snapshot that contains multiple indices, we recommend | ||
creating a <<clone-snapshot-api, clone>> of the snapshot that contains only the | ||
|
@@ -91,25 +113,27 @@ as long as they are fully compatible. | |
=== How {search-snaps} work | ||
|
||
When an index is mounted from a snapshot, {es} allocates its shards to data | ||
nodes within the cluster. The data nodes then automatically restore the shard | ||
data from the repository onto local storage. Once the restore process | ||
completes, these shards respond to searches using the data held in local | ||
storage and do not need to access the repository. This avoids incurring the | ||
cost or performance penalty associated with reading data from the repository. | ||
nodes within the cluster. The data nodes then automatically retrieve the | ||
relevant shard data from the repository onto local storage, based on the mount | ||
options specified. If possible, searches use data from local storage. If the | ||
data is not available locally, {es} downloads the needed data from the snapshot | ||
repository. | ||
|
||
If a node holding one of these shards fails, {es} automatically allocates it to | ||
another node, and that node restores the shard data from the repository. No | ||
replicas are needed, and no complicated monitoring or orchestration is | ||
necessary to restore lost shards. | ||
|
||
{es} restores {search-snap} shards in the background and you can search them | ||
even if they have not been fully restored. If a search hits a {search-snap} | ||
shard before it has been fully restored, {es} eagerly retrieves the data needed | ||
for the search. If a shard is freshly allocated to a node and still warming up, | ||
some searches will be slower. However, searches typically access a very small | ||
fraction of the total shard data so the performance penalty is typically small. | ||
|
||
Replicas of {search-snaps} shards are restored by copying data from the | ||
another node, and that node restores the relevant shard data from the | ||
repository. No replicas are needed, and no complicated monitoring or orchestration | ||
is necessary to restore lost shards. | ||
|
||
If you mounted a full copy of an index, {es} restores the {search-snap} shards in | ||
the background. You can search these shards even if they are not fully restored. | ||
If a search hits such a shard before it's fully restored, {es} eagerly retrieves | ||
the data needed to complete the search. | ||
|
||
If you mounted a shared snapshot cache, {es} only stores small parts of the data | ||
locally on a node in the cluster. The shared cache has a fixed size and evicts | ||
infrequently used parts of the cache. | ||
|
||
Replicas of {search-snap} shards are recovered by accessing the data from the | ||
snapshot repository. In contrast, replicas of regular indices are restored by | ||
copying data from the primary. | ||
|
||
|
@@ -155,13 +179,62 @@ repository storage then you are responsible for its reliability. | |
|
||
experimental::[] | ||
|
||
By default a {search-snap} copies the whole snapshot into the local cluster as | ||
described above. You can also configure a shared snapshot cache which is used | ||
to hold a copy of just the frequently-accessed parts of shards of indices which | ||
are mounted with `?storage=shared_cache`. If you configure a node to have a | ||
shared cache then that node will reserve space for the cache when it starts up. | ||
To mount a shared cache of a snapshot, you must use the the | ||
`xpack.searchable.snapshot.shared_cache.size` setting to reserve space for the | ||
cache on one or more nodes. Indices mounted as a shared cache can only be | ||
allocated to nodes that have this setting explicitly configured. | ||
|
||
`xpack.searchable.snapshot.shared_cache.size`:: | ||
(<<static-cluster-setting,Static>>, <<byte-units,byte value>>) | ||
The size of the space reserved for the shared cache. Defaults to `0b`, meaning | ||
that the node has no shared cache. | ||
|
||
For example: | ||
|
||
[source,yaml] | ||
---- | ||
xpack.searchable.snapshot.shared_cache.size: "4TB" | ||
---- | ||
|
||
IMPORTANT: Currently, you can configure | ||
`xpack.searchable.snapshot.shared_cache.size` on any node. In a future release, | ||
you will only be able to configure this setting on nodes with the | ||
<<data-frozen-node,frozen data>> role. | ||
|
||
You can set `xpack.searchable.snapshot.shared_cache.size` to any size between a | ||
couple of gigabytes up to 90% of available disk space. We only recommend higher | ||
sizes you use the node exclusively as a frozen tier or for searchable snapshots. | ||
|
||
[discrete] | ||
[[searchable-snapshots-frozen-tier-on-cloud]] | ||
==== Configure a frozen tier on the {ess} | ||
|
||
The frozen data tier is not yet available on the {ess-trial}[{ess}]. However, | ||
you can configure another tier to use <<shared-cache,shared snapshot caches>>. | ||
This effectively recreates a frozen tier in your {ess} deployment. Follow these | ||
steps: | ||
|
||
. Choose an existing tier to use. Typically, you'll use the cold tier, but the | ||
hot and warm tiers are also supported. You can use this tier as a shared tier, or | ||
you can dedicate the tier exclusively to the shared snapshot cache. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I felt that we should give some kind of recommendation here. Using the cold tier made sense to me, but I wasn't sure how well this would play with ILM defaults. |
||
|
||
. Log in to the {ess-trial}[{ess} Console]. | ||
|
||
. Select your deployment from the {ess} home page or the deployments page. | ||
|
||
. From your deployment menu, select **Edit deployment**. | ||
|
||
. On the **Edit** page, click **Edit elasticsearch.yml** under your selected | ||
{es} tier. | ||
|
||
. In the `elasticsearch.yml` file, add the | ||
`xpack.searchable.snapshot.shared_cache.size` setting. For example: | ||
+ | ||
[source,yaml] | ||
---- | ||
xpack.searchable.snapshot.shared_cache.size: 50GB | ||
---- | ||
+ | ||
You can configure the cache size from just a few gigabytes to up to 90% of the | ||
available disk space. Shared tiers typically have a smaller cache size while | ||
dedicated tiers should use most of the available disk space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather move this section below the "using searchable snapshots" section, maybe integrating it with "how searchable snapshots work" better, because we prefer users to use ILM for getting data into searchable snapshots and I think having this detail come first will guide users in the wrong direction.