Skip to content

Commit d4292ec

Browse files
committed
Expand conceptual docs for searchable snapshots
1 parent df1d881 commit d4292ec

File tree

1 file changed

+97
-20
lines changed

1 file changed

+97
-20
lines changed
Lines changed: 97 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,101 @@
11
[[searchable-snapshots]]
22
== {search-snaps-cap}
33

4-
{search-snaps-cap} enable you to significantly reduce costs by
5-
leveraging external storage for read-only data.
6-
Like snapshots used for backup and recovery, a searchable snapshot is a point-in-time copy
7-
of an index or data stream stored in a remote data store such as S3.
8-
9-
Snapshot-backed indices use searchable snapshots for redundancy rather than replicas within the cluster.
10-
They support all regular data retrieval operations with performance comparable to a normal index.
11-
In the event of a failure, data is recovered from the snapshot.
12-
Latency increases during recovery, but you can continue to query your data.
13-
14-
A snapshot-backed index essentially halves the number of nodes you need for read-only data.
15-
If you are using {ilm-init} to manage your data, in the cold phase it can
16-
automatically create a searchable snapshot, convert your index to a snapshot-backed index,
17-
and move it to nodes in the cold tier.
18-
19-
While searchable snapshots are separate from the snapshots used for backup and recovery,
20-
they are just snapshots. In fact, you can mount any existing snapshot as a snapshot-backed index.
21-
When you use the same repository for both types of snapshots, each snapshot is incremental.
22-
Files are shared among searchable snapshots and backup snapshots to avoid data duplication.
23-
This means that the additional storage costs for using searchable snapshots are negligible.
4+
Nodes in a distributed system like {es} will inevitably fail from time to time.
5+
To protect your data against node failures, by default when you index a
6+
document into {es} it is stored on two or more nodes. You also take periodic
7+
<<snapshot-restore,snapshots>> of your data so that you can recover from more
8+
serious failures. This means that each document is stored in at least three
9+
places. These extra copies are important for resiliency, but the storage they
10+
consume has an impact on your cluster's operating costs. The two storage
11+
mechanisms have different, but complementary, performance characteristics:
12+
13+
* Snapshot repositories are much more reliable than local storage on individual
14+
nodes.
15+
16+
* The monetary cost per GB in a snapshot repository is usually lower than on a
17+
node.
18+
19+
* The monetary cost per read or write operation on a snapshot repository is
20+
usually much higher.
21+
22+
* Reading or writing data in a snapshot repository usually takes much more time
23+
compared with accessing a node's local storage.
24+
25+
{search-snaps-cap} let you reduce your operating costs by treating the snapshot
26+
as the authoritative copy of some of your indices. The high reliability of the
27+
snapshot repository removes the need to keep multiple copies of their data in
28+
your cluster purely for resiliency. {es} makes a copy of a searchable snapshot
29+
on the nodes in the cluster to reduce the performance impact and costs of
30+
accessing the snapshot repository.
31+
32+
With {search-snaps-cap} you may be able to halve your cluster size without
33+
increasing the risk of data loss or reducing the amount of data exposed to
34+
searches. Put differently, {search-snaps-cap} may allow you to expose twice as
35+
much data to searches for a given cluster size.
36+
37+
=== Using searchable snapshots
38+
39+
A searchable snapshot can be searched just like any other index.
40+
{search-snaps-cap} are often used to access a large archive of historical data,
41+
for which searches may sometimes be complex and time-consuming.
42+
<<async-search>> is particularly useful for these long-running searches.
43+
44+
The shards of searchable snapshots are also allocated just like shards of any
45+
other index. You can, for instance, use <<shard-allocation-filtering>> to
46+
restrict these shards to a subset of your nodes.
47+
48+
Normally you will use {search-snaps-cap} via the
49+
<<ilm-searchable-snapshot,searchable snapshots ILM action>> which automatically
50+
and transparently converts your index into a searchable snapshot when it
51+
reaches the `cold` ILM phase. If you already have some snapshots that you want
52+
to search, you can also use the <<searchable-snapshots-api-mount-snapshot>> to
53+
manually mount them as searchable snapshots.
54+
55+
You must not delete a snapshot while any of its indices are mounted as a
56+
searchable snapshot. However, most snapshots contain a large number of indices,
57+
most of which will not be mounted as searchable snapshots. Therefore we
58+
recommend that you use the <<clone-snapshot-api>> to cheaply create a clone of
59+
a snapshot that contains just the index you want to mount first. This will
60+
allow you to delete older multiple-index snapshots, reducing the size of your
61+
snapshot repository, without losing access to any mounted indices.
62+
63+
We recommend that you <<indices-forcemerge,force-merge>> indices to a single
64+
segment per shard before mounting them as searchable snapshots. Each read from
65+
a snapshot repository takes time and costs money, and the fewer segments there
66+
are the fewer reads are needed to restore the snapshot.
67+
68+
By default a searchable snapshot has `number_of_replicas` set to `0`. You can
69+
increase the number of replicas if desired, for instance if you want to perform
70+
more concurrent searches of these shards.
71+
72+
=== How searchable snapshots work
73+
74+
When you mount a searchable snapshot index, {es} allocates its shards onto the
75+
data nodes in your cluster similarly to shards of regular indices. When a shard
76+
of a searchable snapshot index is allocated to a data node, that node
77+
automatically restores the shard data from the repository into its local
78+
storage. When the restore process has completed these shards will respond to
79+
searches using the data held in local storage and will not need to access the
80+
repository. This avoids incurring the monetary cost or performance penalty
81+
associated with reading data from the repository. However, if the node holding
82+
one of these shards fails then {es} will automatically allocate the shards onto
83+
the other nodes in the cluster and restore the shard data from the repository
84+
again. This means you can safely run these indices without replicas, and yet
85+
you do not need to perform any complicated monitoring or orchestration to
86+
restore lost shards yourself.
87+
88+
Restoring a shard of a searchable snapshot index happens in the background,
89+
which means that you can search these shards even if they have not been fully
90+
restored. If you attempt to search a shard of a searchable snapshot index
91+
before it has been fully restored then {es} will eagerly retrieve just the data
92+
needed for the search. This means that some searches will be slower if the
93+
shard is freshly allocated to a node and still warming up. Searches usually
94+
only need to access a very small fraction of the total shard data so the
95+
performance penalty on searches during the background restore process is often
96+
very small.
97+
98+
Replicas of searchable snapshots are restored by copying data from the snapshot
99+
repository. In contrast, replicas of regular indices are restored by copying
100+
data from the primary.
24101

0 commit comments

Comments
 (0)