1
1
[role="xpack"]
2
2
[testenv="basic"]
3
3
[[getting-started-snapshot-lifecycle-management]]
4
- === Configure snapshot lifecycle policies
4
+ === Tutorial: Automate backups with {slm-init}
5
5
6
- Let's get started with {slm} ( {slm-init}) by working through a
7
- hands-on scenario. The goal of this example is to automatically back up {es}
8
- indices using the <<snapshot-restore,snapshots>> every day at a particular
9
- time. Once these snapshots have been created, they are kept for a configured
10
- amount of time and then deleted per a configured retention policy .
6
+ This tutorial demonstrates how to automate daily backups of {es} indices using an {slm-init} policy.
7
+ The policy takes <<modules-snapshots, snapshots>> of all indices in the cluster
8
+ and stores them in a local repository.
9
+ It also defines a retention policy and automatically deletes snapshots
10
+ when they are no longer needed .
11
11
12
- [float]
12
+ To manage snapshots with {slm-init}, you:
13
+
14
+ . <<slm-gs-register-repository, Register a repository>>.
15
+ . <<slm-gs-create-policy, Create an {slm-init} policy>>.
16
+
17
+ To test the policy, you can manually trigger it to take an initial snapshot.
18
+
19
+ [discrete]
13
20
[[slm-gs-register-repository]]
14
21
==== Register a repository
15
22
16
- Before we can set up an SLM policy, we'll need to set up a
17
- snapshot repository where the snapshots will be
18
- stored. Repositories can use {plugins}/repository.html[many different backends],
19
- including cloud storage providers. You'll probably want to use one of these in
20
- production, but for this example we'll use a shared file system repository:
23
+ To use {slm-init}, you must have a snapshot repository configured.
24
+ The repository can be local (shared filesystem) or remote (cloud storage).
25
+ Remote repositories can reside on S3, HDFS, Azure, Google Cloud Storage,
26
+ or any other platform supported by a {plugins}/repository.html[repository plugin].
27
+ Remote repositories are generally used for production deployments.
28
+
29
+ For this tutorial, you can register a local repository from
30
+ {kibana-ref}/snapshot-repositories.html[{kib} Management]
31
+ or use the put repository API:
21
32
22
33
[source,console]
23
34
-----------------------------------
@@ -30,19 +41,26 @@ PUT /_snapshot/my_repository
30
41
}
31
42
-----------------------------------
32
43
33
- [float ]
44
+ [discrete ]
34
45
[[slm-gs-create-policy]]
35
- ==== Setting up a snapshot policy
46
+ ==== Set up a snapshot policy
36
47
37
- Now that we have a repository in place, we can create a policy to automatically
38
- take snapshots. Policies are written in JSON and will define when to take
39
- snapshots, what the snapshots should be named, and which indices should be
40
- included, among other things. We'll use the <<slm-api-put-policy>> API
41
- to create the policy.
48
+ Once you have a repository in place,
49
+ you can define an {slm-init} policy to take snapshots automatically.
50
+ The policy defines when to take snapshots, which indices should be included,
51
+ and what to name the snapshots.
52
+ A policy can also specify a <<slm-retention,retention policy>> and
53
+ automatically delete snapshots when they are no longer needed.
42
54
43
- When configurating a policy, retention can also optionally be configured. See
44
- the <<slm-retention,SLM retention>> documentation for the full documentation of
45
- how retention works.
55
+ TIP: Don't be afraid to configure a policy that takes frequent snapshots.
56
+ Snapshots are incremental and make efficient use of storage.
57
+
58
+ You can define and manage policies through {kib} Management or with the put policy API.
59
+
60
+ For example, you could define a `nightly-snapshots` policy
61
+ to back up all of your indices daily at 2:30AM UTC.
62
+
63
+ A put policy request defines the policy configuration in JSON:
46
64
47
65
[source,console]
48
66
--------------------------------------------------
@@ -62,66 +80,64 @@ PUT /_slm/policy/nightly-snapshots
62
80
}
63
81
--------------------------------------------------
64
82
// TEST[continued]
65
- <1> when the snapshot should be taken, using
66
- <<schedule-cron,Cron syntax>>, in this
67
- case at 1:30AM each day
68
- <2> whe name each snapshot should be given, using
69
- <<date-math-index-names,date math>> to include the current date in the name
70
- of the snapshot
71
- <3> the repository the snapshot should be stored in
72
- <4> the configuration to be used for the snapshot requests (see below)
73
- <5> which indices should be included in the snapshot, in this case, every index
74
- <6> Optional retention configuration
75
- <7> Keep snapshots for 30 days
76
- <8> Always keep at least 5 successful snapshots
77
- <9> Keep no more than 50 successful snapshots, even if they're less than 30 days old
78
-
79
- This policy will take a snapshot of every index each day at 1:30AM UTC.
80
- Snapshots are incremental, allowing frequent snapshots to be stored efficiently,
81
- so don't be afraid to configure a policy to take frequent snapshots.
82
-
83
- In addition to specifying the indices that should be included in the snapshot,
84
- the `config` field can be used to customize other aspects of the snapshot. You
85
- can use any option allowed in <<snapshots-take-snapshot,a regular snapshot
86
- request>>, so you can specify, for example, whether the snapshot should fail in
87
- special cases, such as if one of the specified indices cannot be found.
88
-
89
- [float]
83
+ <1> When the snapshot should be taken in
84
+ <<schedule-cron,Cron syntax>>: daily at 2:30AM UTC
85
+ <2> How to name the snapshot: use
86
+ <<date-math-index-names,date math>> to include the current date in the snapshot name
87
+ <3> Where to store the snapshot
88
+ <4> The configuration to be used for the snapshot requests (see below)
89
+ <5> Which indices to include in the snapshot: all indices
90
+ <6> Optional retention policy: keep snapshots for 30 days,
91
+ retaining at least 5 and no more than 50 snapshots regardless of age
92
+
93
+ You can specify additional snapshot configuration options to customize how snapshots are taken.
94
+ For example, you could configure the policy to fail the snapshot
95
+ if one of the specified indices is missing.
96
+ For more information about snapshot options, see <<snapshots-take-snapshot,snapshot requests>>.
97
+
98
+ [discrete]
90
99
[[slm-gs-test-policy]]
91
100
==== Test the snapshot policy
92
101
93
- While snapshots taken by SLM policies can be viewed through the standard snapshot
94
- API, SLM also keeps track of policy successes and failures in ways that are a bit
95
- easier to use to make sure the policy is working. Once a policy has executed at
96
- least once, when you view the policy using the <<slm-api-get-policy>>,
97
- some metadata will be returned indicating whether the snapshot was sucessfully
98
- initiated or not.
102
+ A snapshot taken by {slm-init} is just like any other snapshot.
103
+ You can view information about snapshots in {kib} Management or
104
+ get info with the <<snapshots-monitor-snapshot-restore, snapshot APIs>>.
105
+ In addition, {slm-init} keeps track of policy successes and failures so you
106
+ have insight into how the policy is working. If the policy has executed at
107
+ least once, the <<slm-api-get-policy, get policy>> API returns additional metadata
108
+ that shows if the snapshot succeeded.
109
+
110
+ You can manually execute a snapshot policy to take a snapshot immediately.
111
+ This is useful for taking snapshots before making a configuration change,
112
+ upgrading, or to test a new policy.
113
+ Manually executing a policy does not affect its configured schedule.
99
114
100
- Instead of waiting for our policy to run, let's tell SLM to take a snapshot
101
- as using the configuration from our policy right now instead of waiting for
102
- 1:30AM.
115
+ For example, the following request manually triggers the `nightly-snapshots` policy:
103
116
104
117
[source,console]
105
118
--------------------------------------------------
106
119
POST /_slm/policy/nightly-snapshots/_execute
107
120
--------------------------------------------------
108
121
// TEST[skip:we can't easily handle snapshots from docs tests]
109
122
110
- This request will kick off a snapshot for our policy right now, regardless of
111
- the schedule in the policy. This is useful for taking snapshots before making
112
- a configuration change, upgrading, or for our purposes, making sure our policy
113
- is going to work successfully. The policy will continue to run on its configured
114
- schedule after this execution of the policy.
123
+
124
+ After forcing the `nightly-snapshots` policy to run,
125
+ you can retrieve the policy to get success or failure information.
115
126
116
127
[source,console]
117
128
--------------------------------------------------
118
129
GET /_slm/policy/nightly-snapshots?human
119
130
--------------------------------------------------
120
131
// TEST[continued]
121
132
122
- This request will return a response that includes the policy, as well as
123
- information about the last time the policy succeeded and failed, as well as the
124
- next time the policy will be executed.
133
+ Only the most recent success and failure are returned,
134
+ but all policy executions are recorded in the `.slm-history*` indices.
135
+ The response also shows when the policy is scheduled to execute next.
136
+
137
+ NOTE: The response shows if the policy succeeded in _initiating_ a snapshot.
138
+ However, that does not guarantee that the snapshot completed successfully.
139
+ It is possible for the initiated snapshot to fail if, for example, the connection to a remote
140
+ repository is lost while copying files.
125
141
126
142
[source,console-result]
127
143
--------------------------------------------------
@@ -143,44 +159,19 @@ next time the policy will be executed.
143
159
"max_count": 50
144
160
}
145
161
},
146
- "last_success": { <1>
147
- "snapshot_name": "nightly-snap-2019.04.24-tmtnyjtrsxkhbrrdcgg18a", <2 >
148
- "time_string": "2019-04-24T16:43:49.316Z",
162
+ "last_success": {
163
+ "snapshot_name": "nightly-snap-2019.04.24-tmtnyjtrsxkhbrrdcgg18a", <1 >
164
+ "time_string": "2019-04-24T16:43:49.316Z", <2>
149
165
"time": 1556124229316
150
166
} ,
151
- "last_failure": { <3>
152
- "snapshot_name": "nightly-snap-2019.04.02-lohisb5ith2n8hxacaq3mw",
153
- "time_string": "2019-04-02T01:30:00.000Z",
154
- "time": 1556042030000,
155
- "details": "{\"type\":\"index_not_found_exception\",\"reason\":\"no such index [important]\",\"resource.type\":\"index_or_alias\",\"resource.id\":\"important\",\"index_uuid\":\"_na_\",\"index\":\"important\",\"stack_trace\":\"[important] IndexNotFoundException[no such index [important]]\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.indexNotFoundException(IndexNameExpressionResolver.java:762)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:714)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:670)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:163)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:142)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:102)\\n\\tat org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:280)\\n\\tat org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)\\n\\tat org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687)\\n\\tat org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310)\\n\\tat org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210)\\n\\tat org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)\\n\\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\\n\\tat java.base/java.lang.Thread.run(Thread.java:834)\\n\"}"
156
- } ,
157
- "next_execution": "2019-04-24T01:30:00.000Z", <4>
158
- "next_execution_millis": 1556048160000
167
+ "next_execution": "2019-04-24T01:30:00.000Z", <3>
168
+ "next_execution_millis": 1556048160000
159
169
}
160
170
}
161
171
--------------------------------------------------
162
172
// TESTRESPONSE[skip:the presence of last_failure and last_success is asynchronous and will be present for users, but is untestable]
163
173
164
- <1> information about the last time the policy successfully initated a snapshot
165
- <2> the name of the snapshot that was successfully initiated
166
- <3> information about the last time the policy failed to initiate a snapshot
167
- <4> the next time the policy will execute
168
-
169
- NOTE: This metadata only indicates whether the request to initiate the snapshot was
170
- made successfully or not - after the snapshot has been successfully started, it
171
- is possible for the snapshot to fail if, for example, the connection to a remote
172
- repository is lost while copying files.
173
-
174
- If you're following along, the returned SLM policy shouldn't have a `last_failure`
175
- field - it's included above only as an example. You should, however, see a
176
- `last_success` field and a snapshot name. If you do, you've successfully taken
177
- your first snapshot using SLM!
178
-
179
- While only the most recent sucess and failure are available through the Get Policy
180
- API, all policy executions are recorded to a history index, which may be queried
181
- by searching the index pattern `.slm-history*`.
174
+ <1> The name of the last snapshot that was succesfully initiated by the policy
175
+ <2> When the snapshot was initiated
176
+ <3> When the policy will initiate the next snapshot
182
177
183
- That's it! We have our first SLM policy set up to periodically take snapshots
184
- so that our backups are always up to date. You can read more details in the
185
- <<snapshot-lifecycle-management-api,SLM API documentation>> and the
186
- <<modules-snapshots,general snapshot documentation.>>
0 commit comments