|
1 | 1 | [[searchable-snapshots]]
|
2 | 2 | == {search-snaps-cap}
|
3 | 3 |
|
4 |
| -{search-snaps-cap} enable you to significantly reduce costs by |
5 |
| -leveraging external storage for read-only data. |
6 |
| -Like snapshots used for backup and recovery, a searchable snapshot is a point-in-time copy |
7 |
| -of an index or data stream stored in a remote data store such as S3. |
8 |
| - |
9 |
| -Snapshot-backed indices use searchable snapshots for redundancy rather than replicas within the cluster. |
10 |
| -They support all regular data retrieval operations with performance comparable to a normal index. |
11 |
| -In the event of a failure, data is recovered from the snapshot. |
12 |
| -Latency increases during recovery, but you can continue to query your data. |
13 |
| - |
14 |
| -A snapshot-backed index essentially halves the number of nodes you need for read-only data. |
15 |
| -If you are using {ilm-init} to manage your data, in the cold phase it can |
16 |
| -automatically create a searchable snapshot, convert your index to a snapshot-backed index, |
17 |
| -and move it to nodes in the cold tier. |
18 |
| - |
19 |
| -While searchable snapshots are separate from the snapshots used for backup and recovery, |
20 |
| -they are just snapshots. In fact, you can mount any existing snapshot as a snapshot-backed index. |
21 |
| -When you use the same repository for both types of snapshots, each snapshot is incremental. |
22 |
| -Files are shared among searchable snapshots and backup snapshots to avoid data duplication. |
23 |
| -This means that the additional storage costs for using searchable snapshots are negligible. |
| 4 | +Nodes in a distributed system like {es} will inevitably fail from time to time. |
| 5 | +To protect your data against node failures, by default when you index a |
| 6 | +document into {es} it is stored on two or more nodes. You also take periodic |
| 7 | +<<snapshot-restore,snapshots>> of your data so that you can recover from more |
| 8 | +serious failures. This means that each document is stored in at least three |
| 9 | +places. These extra copies are important for resiliency, but the storage they |
| 10 | +consume has an impact on your cluster's operating costs. The two storage |
| 11 | +mechanisms have different, but complementary, performance characteristics: |
| 12 | + |
| 13 | +* Snapshot repositories are much more reliable than local storage on individual |
| 14 | + nodes. |
| 15 | + |
| 16 | +* The monetary cost per GB in a snapshot repository is usually lower than on a |
| 17 | + node. |
| 18 | + |
| 19 | +* The monetary cost per read or write operation on a snapshot repository is |
| 20 | + usually much higher. |
| 21 | + |
| 22 | +* Reading or writing data in a snapshot repository usually takes much more time |
| 23 | + compared with accessing a node's local storage. |
| 24 | + |
| 25 | +{search-snaps-cap} let you reduce your operating costs by treating the snapshot |
| 26 | +as the authoritative copy of some of your indices. The high reliability of the |
| 27 | +snapshot repository removes the need to keep multiple copies of their data in |
| 28 | +your cluster purely for resiliency. {es} makes a copy of a searchable snapshot |
| 29 | +on the nodes in the cluster to reduce the performance impact and costs of |
| 30 | +accessing the snapshot repository. |
| 31 | + |
| 32 | +With {search-snaps-cap} you may be able to halve your cluster size without |
| 33 | +increasing the risk of data loss or reducing the amount of data exposed to |
| 34 | +searches. Put differently, {search-snaps-cap} may allow you to expose twice as |
| 35 | +much data to searches for a given cluster size. |
| 36 | + |
| 37 | +=== Using searchable snapshots |
| 38 | + |
| 39 | +A searchable snapshot can be searched just like any other index. |
| 40 | +{search-snaps-cap} are often used to access a large archive of historical data, |
| 41 | +for which searches may sometimes be complex and time-consuming. |
| 42 | +<<async-search>> is particularly useful for these long-running searches. |
| 43 | + |
| 44 | +The shards of searchable snapshots are also allocated just like shards of any |
| 45 | +other index. You can, for instance, use <<shard-allocation-filtering>> to |
| 46 | +restrict these shards to a subset of your nodes. |
| 47 | + |
| 48 | +Normally you will use {search-snaps-cap} via the |
| 49 | +<<ilm-searchable-snapshot,searchable snapshots ILM action>> which automatically |
| 50 | +and transparently converts your index into a searchable snapshot when it |
| 51 | +reaches the `cold` ILM phase. If you already have some snapshots that you want |
| 52 | +to search, you can also use the <<searchable-snapshots-api-mount-snapshot>> to |
| 53 | +manually mount them as searchable snapshots. |
| 54 | + |
| 55 | +You must not delete a snapshot while any of its indices are mounted as a |
| 56 | +searchable snapshot. However, most snapshots contain a large number of indices, |
| 57 | +most of which will not be mounted as searchable snapshots. Therefore we |
| 58 | +recommend that you use the <<clone-snapshot-api>> to cheaply create a clone of |
| 59 | +a snapshot that contains just the index you want to mount first. This will |
| 60 | +allow you to delete older multiple-index snapshots, reducing the size of your |
| 61 | +snapshot repository, without losing access to any mounted indices. |
| 62 | + |
| 63 | +We recommend that you <<indices-forcemerge,force-merge>> indices to a single |
| 64 | +segment per shard before mounting them as searchable snapshots. Each read from |
| 65 | +a snapshot repository takes time and costs money, and the fewer segments there |
| 66 | +are the fewer reads are needed to restore the snapshot. |
| 67 | + |
| 68 | +By default a searchable snapshot has `number_of_replicas` set to `0`. You can |
| 69 | +increase the number of replicas if desired, for instance if you want to perform |
| 70 | +more concurrent searches of these shards. |
| 71 | + |
| 72 | +=== How searchable snapshots work |
| 73 | + |
| 74 | +When you mount a searchable snapshot index, {es} allocates its shards onto the |
| 75 | +data nodes in your cluster similarly to shards of regular indices. When a shard |
| 76 | +of a searchable snapshot index is allocated to a data node, that node |
| 77 | +automatically restores the shard data from the repository into its local |
| 78 | +storage. When the restore process has completed these shards will respond to |
| 79 | +searches using the data held in local storage and will not need to access the |
| 80 | +repository. This avoids incurring the monetary cost or performance penalty |
| 81 | +associated with reading data from the repository. However, if the node holding |
| 82 | +one of these shards fails then {es} will automatically allocate the shards onto |
| 83 | +the other nodes in the cluster and restore the shard data from the repository |
| 84 | +again. This means you can safely run these indices without replicas, and yet |
| 85 | +you do not need to perform any complicated monitoring or orchestration to |
| 86 | +restore lost shards yourself. |
| 87 | + |
| 88 | +Restoring a shard of a searchable snapshot index happens in the background, |
| 89 | +which means that you can search these shards even if they have not been fully |
| 90 | +restored. If you attempt to search a shard of a searchable snapshot index |
| 91 | +before it has been fully restored then {es} will eagerly retrieve just the data |
| 92 | +needed for the search. This means that some searches will be slower if the |
| 93 | +shard is freshly allocated to a node and still warming up. Searches usually |
| 94 | +only need to access a very small fraction of the total shard data so the |
| 95 | +performance penalty on searches during the background restore process is often |
| 96 | +very small. |
| 97 | + |
| 98 | +Replicas of searchable snapshots are restored by copying data from the snapshot |
| 99 | +repository. In contrast, replicas of regular indices are restored by copying |
| 100 | +data from the primary. |
24 | 101 |
|
0 commit comments