Skip to content

Commit f9aa282

Browse files
ywelschjrodewigdebadairDaveCTurner
authored
Additional docs for shared_cache searchable snapshots (#70566)
This adds additional documentation for shared_cache searchable snapshots that are targeting the frozen tier: - it generalizes the introduction section on searchable snapshots, mentioning that they come in two flavors now as well as the relation to cold and frozen tiers, - it expands the shared_cache section and - it adds Cloud-specific instructions for getting started with the frozen tier Co-authored-by: James Rodewig <[email protected]> Co-authored-by: debadair <[email protected]> Co-authored-by: David Turner <[email protected]>
1 parent f9a0049 commit f9aa282

File tree

3 files changed

+160
-67
lines changed

3 files changed

+160
-67
lines changed

docs/reference/datatiers.asciidoc

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -87,12 +87,27 @@ For resiliency, indices in the cold tier can rely on
8787

8888
experimental::[]
8989

90-
Once data is no longer being queried, or being queried rarely, it may move from the cold tier
91-
to the frozen tier where it stays for the rest of its life.
92-
The frozen tier is a less responsive query tier than the cold tier, and data in the frozen tier
93-
cannot be updated. The frozen tier holds searchable snapshots mounted using the
94-
`shared_cache` storage option exclusively. The <<ilm-index-lifecycle, frozen phase>> converts data
95-
transitioning into the frozen tier into a searchable snapshot, eliminating the need for replicas or even a local copy.
90+
Once data is no longer being queried, or being queried rarely, it may move from
91+
the cold tier to the frozen tier where it stays for the rest of its life.
92+
93+
The frozen tier uses <<searchable-snapshots,{search-snaps}>> to store and load
94+
data from a snapshot repository. Instead of using a full local copy of your
95+
data, these {search-snaps} use smaller <<shared-cache,local caches>> containing
96+
only recently searched data. If a search requires data that is not in a cache,
97+
{es} fetches the data as needed from the snapshot repository. This decouples
98+
compute and storage, letting you run searches over very large data sets with
99+
minimal compute resources, which significantly reduces your storage and
100+
operating costs.
101+
102+
The <<ilm-index-lifecycle, frozen phase>> automatically converts data
103+
transitioning into the frozen tier into a shared-cache searchable snapshot.
104+
105+
Search is typically slower on the frozen tier than the cold tier, because {es}
106+
must sometimes fetch data from the snapshot repository.
107+
108+
NOTE: The frozen tier is not yet available on the {ess-trial}[{ess}]. To
109+
recreate similar functionality, see
110+
<<searchable-snapshots-frozen-tier-on-cloud>>.
96111

97112
[discrete]
98113
[[data-tier-allocation]]

docs/reference/searchable-snapshots/apis/mount-snapshot.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ searches of the mounted index. If `full_copy`, each node holding a shard of the
5656
searchable snapshot index makes a full copy of the shard to its local storage.
5757
If `shared_cache`, the shard uses the
5858
<<searchable-snapshots-shared-cache,shared cache>>. Defaults to `full_copy`.
59+
See <<searchable-snapshot-mount-storage-options>>.
5960

6061
[[searchable-snapshots-api-mount-request-body]]
6162
==== {api-request-body-title}
Lines changed: 138 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,47 @@
11
[[searchable-snapshots]]
22
== {search-snaps-cap}
33

4-
{search-snaps-cap} let you reduce your operating costs by using
5-
<<snapshot-restore, snapshots>> for resiliency rather than maintaining
6-
<<scalability,replica shards>> within a cluster. When you mount an index from a
7-
snapshot as a {search-snap}, {es} copies the index shards to local storage
8-
within the cluster. This ensures that search performance is comparable to
9-
searching any other index, and minimizes the need to access the snapshot
10-
repository. Should a node fail, shards of a {search-snap} index are
11-
automatically recovered from the snapshot repository.
12-
13-
This can result in significant cost savings for less frequently searched data.
14-
With {search-snaps}, you no longer need an extra index shard copy to avoid data
15-
loss, potentially halving the node local storage capacity necessary for
16-
searching that data. Because {search-snaps} rely on the same snapshot mechanism
17-
you use for backups, they have a minimal impact on your snapshot repository
18-
storage costs.
4+
{search-snaps-cap} let you use <<snapshot-restore,snapshots>> to search
5+
infrequently accessed and read-only data in a very cost-effective fashion. The
6+
<<cold-tier,cold>> and <<frozen-tier,frozen>> data tiers use {search-snaps} to
7+
reduce your storage and operating costs.
8+
9+
{search-snaps-cap} eliminate the need for <<scalability,replica shards>>,
10+
potentially halving the local storage needed to search your data.
11+
{search-snaps-cap} rely on the same snapshot mechanism you already use for
12+
backups and have minimal impact on your snapshot repository storage costs.
1913

2014
[discrete]
2115
[[using-searchable-snapshots]]
2216
=== Using {search-snaps}
2317

2418
Searching a {search-snap} index is the same as searching any other index.
25-
Search performance is comparable to regular indices because the shard data is
26-
copied onto nodes in the cluster when the {search-snap} is mounted.
2719

2820
By default, {search-snap} indices have no replicas. The underlying snapshot
2921
provides resilience and the query volume is expected to be low enough that a
3022
single shard copy will be sufficient. However, if you need to support a higher
3123
query volume, you can add replicas by adjusting the `index.number_of_replicas`
3224
index setting.
3325

34-
If a node fails and {search-snap} shards need to be restored from the snapshot,
35-
there is a brief window of time while {es} allocates the shards to other nodes
36-
where the cluster health will not be `green`. Searches that hit these shards
37-
will fail or return partial results until the shards are reallocated to healthy
38-
nodes.
26+
If a node fails and {search-snap} shards need to be recovered elsewhere, there
27+
is a brief window of time while {es} allocates the shards to other nodes where
28+
the cluster health will not be `green`. Searches that hit these shards may fail
29+
or return partial results until the shards are reallocated to healthy nodes.
3930

4031
You typically manage {search-snaps} through {ilm-init}. The
4132
<<ilm-searchable-snapshot, searchable snapshots>> action automatically converts
42-
a regular index into a {search-snap} index when it reaches the `cold` phase.
43-
You can also make indices in existing snapshots searchable by manually mounting
44-
them as {search-snap} indices with the
45-
<<searchable-snapshots-api-mount-snapshot, mount snapshot>> API.
33+
a regular index into a {search-snap} index when it reaches the `cold` or
34+
`frozen` phase. You can also make indices in existing snapshots searchable by
35+
manually mounting them using the <<searchable-snapshots-api-mount-snapshot,
36+
mount snapshot>> API.
4637

4738
To mount an index from a snapshot that contains multiple indices, we recommend
4839
creating a <<clone-snapshot-api, clone>> of the snapshot that contains only the
4940
index you want to search, and mounting the clone. You should not delete a
5041
snapshot if it has any mounted indices, so creating a clone enables you to
5142
manage the lifecycle of the backup snapshot independently of any
52-
{search-snaps}.
43+
{search-snaps}. If you use {ilm-init} to manage your {search-snaps} then it
44+
will automatically look after cloning the snapshot as needed.
5345

5446
You can control the allocation of the shards of {search-snap} indices using the
5547
same mechanisms as for regular indices. For example, you could use
@@ -60,7 +52,7 @@ We recommend that you <<indices-forcemerge, force-merge>> indices to a single
6052
segment per shard before taking a snapshot that will be mounted as a
6153
{search-snap} index. Each read from a snapshot repository takes time and costs
6254
money, and the fewer segments there are the fewer reads are needed to restore
63-
the snapshot.
55+
the snapshot or to respond to a search.
6456

6557
[TIP]
6658
====
@@ -84,35 +76,104 @@ You can use any of the following repository types with searchable snapshots:
8476
You can also use alternative implementations of these repository types, for
8577
instance
8678
{plugins}/repository-s3-client.html#repository-s3-compatible-services[Minio],
87-
as long as they are fully compatible.
79+
as long as they are fully compatible. You can use the <<repo-analysis-api>> API
80+
to analyze your repository's suitability for use with searchable snapshots.
8881

8982
[discrete]
9083
[[how-searchable-snapshots-work]]
9184
=== How {search-snaps} work
9285

9386
When an index is mounted from a snapshot, {es} allocates its shards to data
94-
nodes within the cluster. The data nodes then automatically restore the shard
95-
data from the repository onto local storage. Once the restore process
96-
completes, these shards respond to searches using the data held in local
97-
storage and do not need to access the repository. This avoids incurring the
98-
cost or performance penalty associated with reading data from the repository.
99-
100-
If a node holding one of these shards fails, {es} automatically allocates it to
101-
another node, and that node restores the shard data from the repository. No
102-
replicas are needed, and no complicated monitoring or orchestration is
103-
necessary to restore lost shards.
104-
105-
{es} restores {search-snap} shards in the background and you can search them
106-
even if they have not been fully restored. If a search hits a {search-snap}
107-
shard before it has been fully restored, {es} eagerly retrieves the data needed
108-
for the search. If a shard is freshly allocated to a node and still warming up,
109-
some searches will be slower. However, searches typically access a very small
110-
fraction of the total shard data so the performance penalty is typically small.
111-
112-
Replicas of {search-snaps} shards are restored by copying data from the
113-
snapshot repository. In contrast, replicas of regular indices are restored by
87+
nodes within the cluster. The data nodes then automatically retrieve the
88+
relevant shard data from the repository onto local storage, based on the
89+
<<searchable-snapshot-mount-storage-options,mount options>> specified. If
90+
possible, searches use data from local storage. If the data is not available
91+
locally, {es} downloads the data that it needs from the snapshot repository.
92+
93+
If a node holding one of these shards fails, {es} automatically allocates the
94+
affected shards on another node, and that node restores the relevant shard data
95+
from the repository. No replicas are needed, and no complicated monitoring or
96+
orchestration is necessary to restore lost shards. Although searchable snapshot
97+
indices have no replicas by default, you may add replicas to these indices by
98+
adjusting `index.number_of_replicas`. Replicas of {search-snap} shards are
99+
recovered by copying data from the snapshot repository, just like primaries of
100+
{search-snap} shards. In contrast, replicas of regular indices are restored by
114101
copying data from the primary.
115102

103+
[discrete]
104+
[[searchable-snapshot-mount-storage-options]]
105+
==== Mount options
106+
107+
To search a snapshot, you must first mount it locally as an index. Usually
108+
{ilm-init} will do this automatically, but you can also call the
109+
<<searchable-snapshots-api-mount-snapshot,mount snapshot>> API yourself. There
110+
are two options for mounting a snapshot, each with different performance
111+
characteristics and local storage footprints:
112+
113+
[[full-copy]]
114+
Full copy::
115+
Loads a full copy of the snapshotted index's shards onto node-local storage
116+
within the cluster. This is the default mount option. {ilm-init} uses this
117+
option by default in the `hot` and `cold` phases.
118+
+
119+
Search performance for a full-copy searchable snapshot index is normally
120+
comparable to a regular index, since there is minimal need to access the
121+
snapshot repository. While recovery is ongoing, search performance may be
122+
slower than with a regular index because a search may need some data that has
123+
not yet been retrieved into the local copy. If that happens, {es} will eagerly
124+
retrieve the data needed to complete the search in parallel with the ongoing
125+
recovery.
126+
127+
[[shared-cache]]
128+
Shared cache::
129+
+
130+
experimental::[]
131+
+
132+
Uses a local cache containing only recently searched parts of the snapshotted
133+
index's data. {ilm-init} uses this option by default in the `frozen` phase and
134+
corresponding frozen tier.
135+
+
136+
If a search requires data that is not in the cache, {es} fetches the missing
137+
data from the snapshot repository. Searches that require these fetches are
138+
slower, but the fetched data is stored in the cache so that similar searches
139+
can be served more quickly in future. {es} will evict infrequently used data
140+
from the cache to free up space.
141+
+
142+
Although slower than a full local copy or a regular index, a shared-cache
143+
searchable snapshot index still returns search results quickly, even for large
144+
data sets, because the layout of data in the repository is heavily optimized
145+
for search. Many searches will need to retrieve only a small subset of the
146+
total shard data before returning results.
147+
148+
To mount a searchable snapshot index with the shared cache mount option, you
149+
must configure the `xpack.searchable.snapshot.shared_cache.size` setting to
150+
reserve space for the cache on one or more nodes. Indices mounted with the
151+
shared cache mount option are only allocated to nodes that have this setting
152+
configured.
153+
154+
[[searchable-snapshots-shared-cache]]
155+
`xpack.searchable.snapshot.shared_cache.size`::
156+
(<<static-cluster-setting,Static>>, <<byte-units,byte value>>)
157+
The size of the space reserved for the shared cache. Defaults to `0b`, meaning
158+
that the node has no shared cache.
159+
160+
You can configure the setting in `elasticsearch.yml`:
161+
162+
[source,yaml]
163+
----
164+
xpack.searchable.snapshot.shared_cache.size: 4TB
165+
----
166+
167+
IMPORTANT: Currently, you can configure
168+
`xpack.searchable.snapshot.shared_cache.size` on any node. In a future release,
169+
you will only be able to configure this setting on nodes with the
170+
<<data-frozen-node,`data_frozen`>> role.
171+
172+
You can set `xpack.searchable.snapshot.shared_cache.size` to any size between a
173+
couple of gigabytes up to 90% of available disk space. We only recommend higher
174+
sizes if you use the node exclusively on a frozen tier or for searchable
175+
snapshots.
176+
116177
[discrete]
117178
[[back-up-restore-searchable-snapshots]]
118179
=== Back up and restore {search-snaps}
@@ -150,18 +211,34 @@ very good protection against data loss or corruption. If you manage your own
150211
repository storage then you are responsible for its reliability.
151212

152213
[discrete]
153-
[[searchable-snapshots-shared-cache]]
154-
=== Shared snapshot cache
214+
[[searchable-snapshots-frozen-tier-on-cloud]]
215+
=== Configure a frozen tier on the {ess}
155216

156-
experimental::[]
217+
The frozen data tier is not yet available on the {ess-trial}[{ess}]. However,
218+
you can configure another tier to use <<shared-cache,shared snapshot caches>>.
219+
This effectively recreates a frozen tier in your {ess} deployment. Follow these
220+
steps:
157221

158-
By default a {search-snap} copies the whole snapshot into the local cluster as
159-
described above. You can also configure a shared snapshot cache which is used
160-
to hold a copy of just the frequently-accessed parts of shards of indices which
161-
are mounted with `?storage=shared_cache`. If you configure a node to have a
162-
shared cache then that node will reserve space for the cache when it starts up.
222+
. Choose an existing tier to use. Typically, you'll use the cold tier, but the
223+
hot and warm tiers are also supported. You can use this tier as a shared tier,
224+
or you can dedicate the tier exclusively to shared snapshot caches.
163225

164-
`xpack.searchable.snapshot.shared_cache.size`::
165-
(<<static-cluster-setting,Static>>, <<byte-units,byte value>>)
166-
The size of the space reserved for the shared cache. Defaults to `0b`, meaning
167-
that the node has no shared cache.
226+
. Log in to the {ess-trial}[{ess} Console].
227+
228+
. Select your deployment from the {ess} home page or the deployments page.
229+
230+
. From your deployment menu, select **Edit deployment**.
231+
232+
. On the **Edit** page, click **Edit elasticsearch.yml** under your selected
233+
{es} tier.
234+
235+
. In the `elasticsearch.yml` file, add the
236+
<<searchable-snapshots-shared-cache,`xpack.searchable.snapshot.shared_cache.size`>>
237+
setting. For example:
238+
+
239+
[source,yaml]
240+
----
241+
xpack.searchable.snapshot.shared_cache.size: 50GB
242+
----
243+
244+
. Click **Save** and **Confirm** to apply your configuration changes.

0 commit comments

Comments
 (0)