Skip to content

Commit ea4069c

Browse files
committed
Add Snapshot Lifecycle Retention documentation (#47545)
* Add Snapshot Lifecycle Retention documentation This commits adds API and general purpose documentation for SLM retention. Relates to #43663 * Fix docs tests * Update default now that #47604 has been merged * Update docs/reference/ilm/apis/slm-api.asciidoc Co-Authored-By: Gordon Brown <[email protected]> * Update docs/reference/ilm/apis/slm-api.asciidoc Co-Authored-By: Gordon Brown <[email protected]> * Update docs with feedback
1 parent b578059 commit ea4069c

File tree

4 files changed

+205
-21
lines changed

4 files changed

+205
-21
lines changed

docs/reference/ilm/apis/slm-api.asciidoc

Lines changed: 60 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,9 @@ The Snapshot Lifecycle Management APIs are used to manage policies for the time
77
and frequency of automatic snapshots. Snapshot Lifecycle Management is related
88
to <<index-lifecycle-management,Index Lifecycle Management>>, however, instead
99
of managing a lifecycle of actions that are performed on a single index, SLM
10-
allows configuring policies spanning multiple indices.
10+
allows configuring policies spanning multiple indices. Snapshot Lifecycle
11+
Management can also perform deletion of older snapshots based on a configurable
12+
retention policy.
1113

1214
SLM policy management is split into three different CRUD APIs, a way to put or update
1315
policies, a way to retrieve policies, and a way to delete unwanted policies, as
@@ -62,7 +64,11 @@ PUT /_slm/policy/daily-snapshots
6264
"ignore_unavailable": false,
6365
"include_global_state": false
6466
},
65-
"retention": {}
67+
"retention": { <6>
68+
"expire_after": "30d", <7>
69+
"min_count": 5, <8>
70+
"max_count": 50 <9>
71+
}
6672
}
6773
--------------------------------------------------
6874
// TEST[setup:setup-repository]
@@ -72,6 +78,10 @@ PUT /_slm/policy/daily-snapshots
7278
<3> Which repository to take the snapshot in
7379
<4> Any extra snapshot configuration
7480
<5> Which indices the snapshot should contain
81+
<6> Optional retention configuration
82+
<7> Keep snapshots for 30 days
83+
<8> Always keep at least 5 successful snapshots, even if they're more than 30 days old
84+
<9> Keep no more than 50 successful snapshots, even if they're less than 30 days old
7585

7686
The top-level keys that the policy supports are described below:
7787

@@ -139,7 +149,11 @@ The output looks similar to the following:
139149
"ignore_unavailable": false,
140150
"include_global_state": false
141151
},
142-
"retention": {}
152+
"retention": {
153+
"expire_after": "30d",
154+
"min_count": 5,
155+
"max_count": 50
156+
}
143157
},
144158
"stats": {
145159
"policy": "daily-snapshots",
@@ -229,7 +243,11 @@ Which, in this case shows an error because the index did not exist:
229243
"ignore_unavailable": false,
230244
"include_global_state": false
231245
},
232-
"retention": {}
246+
"retention": {
247+
"expire_after": "30d",
248+
"min_count": 5,
249+
"max_count": 50
250+
}
233251
},
234252
"stats": {
235253
"policy": "daily-snapshots",
@@ -270,6 +288,11 @@ PUT /_slm/policy/daily-snapshots
270288
"indices": ["data-*", "important"],
271289
"ignore_unavailable": true,
272290
"include_global_state": false
291+
},
292+
"retention": {
293+
"expire_after": "30d",
294+
"min_count": 5,
295+
"max_count": 50
273296
}
274297
}
275298
--------------------------------------------------
@@ -318,7 +341,11 @@ Which now includes the successful snapshot information:
318341
"ignore_unavailable": true,
319342
"include_global_state": false
320343
},
321-
"retention": {}
344+
"retention": {
345+
"expire_after": "30d",
346+
"min_count": 5,
347+
"max_count": 50
348+
}
322349
},
323350
"stats": {
324351
"policy": "daily-snapshots",
@@ -374,22 +401,14 @@ Which returns a response similar to:
374401
"retention_timed_out": 0,
375402
"retention_deletion_time": "1.4s",
376403
"retention_deletion_time_millis": 1404,
377-
"policy_metrics": [
378-
{
379-
"policy": "daily-snapshots",
380-
"snapshots_taken": 1,
381-
"snapshots_failed": 1,
382-
"snapshots_deleted": 0,
383-
"snapshot_deletion_failures": 0
384-
}
385-
],
404+
"policy_stats": [ ],
386405
"total_snapshots_taken": 1,
387406
"total_snapshots_failed": 1,
388407
"total_snapshots_deleted": 0,
389408
"total_snapshot_deletion_failures": 0
390409
}
391410
--------------------------------------------------
392-
// TESTRESPONSE[s/runs": 13/runs": $body.retention_runs/ s/_failed": 0/_failed": $body.retention_failed/ s/_timed_out": 0/_timed_out": $body.retention_timed_out/ s/"1.4s"/$body.retention_deletion_time/ s/1404/$body.retention_deletion_time_millis/]
411+
// TESTRESPONSE[s/runs": 13/runs": $body.retention_runs/ s/_failed": 0/_failed": $body.retention_failed/ s/_timed_out": 0/_timed_out": $body.retention_timed_out/ s/"1.4s"/$body.retention_deletion_time/ s/1404/$body.retention_deletion_time_millis/ s/total_snapshots_taken": 1/total_snapshots_taken": $body.total_snapshots_taken/ s/total_snapshots_failed": 1/total_snapshots_failed": $body.total_snapshots_failed/ s/"policy_stats": [.*]/"policy_stats": $body.policy_stats/]
393412

394413
[[slm-api-delete]]
395414
=== Delete Snapshot Lifecycle Policy API
@@ -410,3 +429,29 @@ any currently ongoing snapshots or remove any previously taken snapshots.
410429
DELETE /_slm/policy/daily-snapshots
411430
--------------------------------------------------
412431
// TEST[continued]
432+
433+
[[slm-api-execute-retention]]
434+
=== Execute Snapshot Lifecycle Retention API
435+
436+
While Snapshot Lifecycle Management retention is usually invoked through the global cluster settings
437+
for its schedule, it can sometimes be useful to invoke a retention run to expunge expired snapshots
438+
immediately. This API allows you to run a one-off retention run.
439+
440+
==== Example
441+
442+
To immediately start snapshot retention, use the following
443+
444+
[source,console]
445+
--------------------------------------------------
446+
POST /_slm/_execute_retention
447+
--------------------------------------------------
448+
449+
This API will immediately return, as retention will be run asynchronously in the background:
450+
451+
[source,console-result]
452+
--------------------------------------------------
453+
{
454+
"acknowledged": true
455+
}
456+
--------------------------------------------------
457+

docs/reference/ilm/getting-started-slm.asciidoc

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,26 @@
66
Let's get started with snapshot lifecycle management (SLM) by working through a
77
hands-on scenario. The goal of this example is to automatically back up {es}
88
indices using the <<modules-snapshots,snapshots>> every day at a particular
9-
time.
9+
time. Once these snapshots have been created, they are kept for a configured
10+
amount of time and then deleted per a configured retention policy.
1011

1112
[float]
1213
[[slm-and-security]]
1314
=== Security and SLM
1415
Before starting, it's important to understand the privileges that are needed
1516
when configuring SLM if you are using the security plugin. There are two
1617
built-in cluster privileges that can be used to assist: `manage_slm` and
17-
`read_slm`. It's also good to note that the `create_snapshot` permission
18-
allows taking snapshots even for indices the role may not have access to.
18+
`read_slm`. It's also good to note that the `cluster:admin/snapshot/*`
19+
permission allows taking and deleting snapshots even for indices the role may
20+
not have access to.
1921

2022
An example of configuring an administrator role for SLM follows:
2123

2224
[source,console]
2325
-----------------------------------
2426
POST /_security/role/slm-admin
2527
{
26-
"cluster": ["manage_slm", "create_snapshot"],
28+
"cluster": ["manage_slm", "cluster:admin/snapshot/*"],
2729
"indices": [
2830
{
2931
"names": [".slm-history-*"],
@@ -82,6 +84,10 @@ snapshots, what the snapshots should be named, and which indices should be
8284
included, among other things. We'll use the <<slm-api-put,Put Policy>> API
8385
to create the policy.
8486

87+
When configurating a policy, retention can also optionally be configured. See
88+
the <<slm-retention,SLM retention>> documentation for the full documentation of
89+
how retention works.
90+
8591
[source,console]
8692
--------------------------------------------------
8793
PUT /_slm/policy/nightly-snapshots
@@ -92,7 +98,11 @@ PUT /_slm/policy/nightly-snapshots
9298
"config": { <4>
9399
"indices": ["*"] <5>
94100
},
95-
"retention": {}
101+
"retention": { <6>
102+
"expire_after": "30d", <7>
103+
"min_count": 5, <8>
104+
"max_count": 50 <9>
105+
}
96106
}
97107
--------------------------------------------------
98108
// TEST[continued]
@@ -105,6 +115,10 @@ PUT /_slm/policy/nightly-snapshots
105115
<3> the repository the snapshot should be stored in
106116
<4> the configuration to be used for the snapshot requests (see below)
107117
<5> which indices should be included in the snapshot, in this case, every index
118+
<6> Optional retention configuration
119+
<7> Keep snapshots for 30 days
120+
<8> Always keep at least 5 successful snapshots
121+
<9> Keep no more than 50 successful snapshots, even if they're less than 30 days old
108122

109123
This policy will take a snapshot of every index each day at 1:30AM UTC.
110124
Snapshots are incremental, allowing frequent snapshots to be stored efficiently,
@@ -166,7 +180,11 @@ next time the policy will be executed.
166180
"config": {
167181
"indices": ["*"],
168182
},
169-
"retention": {}
183+
"retention": {
184+
"expire_after": "30d",
185+
"min_count": 5,
186+
"max_count": 50
187+
}
170188
},
171189
"last_success": { <1>
172190
"snapshot_name": "nightly-snap-2019.04.24-tmtnyjtrsxkhbrrdcgg18a", <2>

docs/reference/ilm/index.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,3 +87,5 @@ include::start-stop-ilm.asciidoc[]
8787
include::ilm-with-existing-indices.asciidoc[]
8888

8989
include::getting-started-slm.asciidoc[]
90+
91+
include::slm-retention.asciidoc[]
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
[role="xpack"]
2+
[testenv="basic"]
3+
[[slm-retention]]
4+
== Snapshot lifecycle management retention
5+
6+
Automatic deletion of older snapshots is an optional feature of snapshot lifecycle management.
7+
Retention is run as a cluster level task that is not associated with a particular policy's schedule
8+
(though the configuration of which snapshots to keep is done on a per-policy basis). Retention
9+
configuration conists of two parts—The first a cluster-level configuration for when retention is
10+
run and for how long, the second configured on a policy for which snapshots should be eligible for
11+
retention.
12+
13+
The cluster level settings for retention are shown below, and can be changed dynamically using the
14+
<<cluster-update-settings,cluster-update-settings>> API:
15+
16+
|=====================================
17+
| Setting | Default value | Description
18+
19+
| `slm.retention_schedule` | `0 30 1 * * ?` | A periodic or absolute time schedule for when
20+
retention should be run. Supports all values supported by the cron scheduler: <<schedule-cron,Cron
21+
scheduler configuration>>. Retention can also be manually run using the
22+
<<slm-api-execute-retention,Execute retention API>>. Defaults to daily at 1:30am in the master
23+
node's timezone.
24+
25+
| `slm.retention_duration` | `"1h"` | A limit of how long SLM should spend deleting old snapshots.
26+
|=====================================
27+
28+
Policy level configuration for retention is done inside the `retention` object when creating or
29+
updating a policy. All of the retention configurations options are optional.
30+
31+
[source,console]
32+
--------------------------------------------------
33+
PUT /_slm/policy/daily-snapshots
34+
{
35+
"schedule": "0 30 1 * * ?",
36+
"name": "<daily-snap-{now/d}>",
37+
"repository": "my_repository",
38+
"retention": { <1>
39+
"expire_after": "30d", <2>
40+
"min_count": 5, <3>
41+
"max_count": 50 <4>
42+
}
43+
}
44+
--------------------------------------------------
45+
// TEST[setup:setup-repository]
46+
<1> Optional retention configuration
47+
<2> Keep snapshots for 30 days
48+
<3> Always keep at least 5 successful snapshots
49+
<4> Keep no more than 50 successful snapshots
50+
51+
Supported configuration for retention from within a policy are as follows. The default value for
52+
each is unset unless specified by the user in the policy configuration.
53+
54+
NOTE: The oldest snapshots are always deleted first, in the case of a `max_count` of 5 for a policy
55+
with 6 snapshots, the oldest snapshot will be deleted.
56+
57+
|=====================================
58+
| Setting | Description
59+
| `expire_after` | A timevalue for how old a snapshot must be in order to be eligible for deletion.
60+
| `min_count` | A minimum number of snapshots to keep, regardless of age.
61+
| `max_count` | The maximum number of snapshots to keep, regardless of age.
62+
|=====================================
63+
64+
As an example, the retention setting in the policy configured about would read in English as:
65+
66+
____
67+
Remove snapshots older than thirty days, but always keep the latest five snapshots. If there are
68+
more than fifty snapshots, remove the oldest surplus snapshots until there are no more than fifty
69+
successful snapshots.
70+
____
71+
72+
If multiple policies are configured to snapshot to the same repository, or manual snapshots have
73+
been taken without using the <<slm-api-execute,Execute Policy API>>, they are treated as not
74+
eligible for retention, and do not count towards any limits. This allows multiple policies to have
75+
differing retention configuration while using the same snapshot repository.
76+
77+
Statistics for snapshot retention can be retrieved using the <<slm-get-stats,Get Snapshot Lifecycle
78+
Stats API>>:
79+
80+
[source,console]
81+
--------------------------------------------------
82+
GET /_slm/stats
83+
--------------------------------------------------
84+
// TEST[continued]
85+
86+
Which returns a response
87+
88+
[source,js]
89+
--------------------------------------------------
90+
{
91+
"retention_runs": 13, <1>
92+
"retention_failed": 0, <2>
93+
"retention_timed_out": 0, <3>
94+
"retention_deletion_time": "1.4s", <4>
95+
"retention_deletion_time_millis": 1404,
96+
"policy_stats": [
97+
{
98+
"policy": "daily-snapshots",
99+
"snapshots_taken": 1,
100+
"snapshots_failed": 1,
101+
"snapshots_deleted": 0, <5>
102+
"snapshot_deletion_failures": 0 <6>
103+
}
104+
],
105+
"total_snapshots_taken": 1,
106+
"total_snapshots_failed": 1,
107+
"total_snapshots_deleted": 0, <7>
108+
"total_snapshot_deletion_failures": 0 <8>
109+
}
110+
--------------------------------------------------
111+
// TESTRESPONSE[skip:this is not actually running retention]
112+
<1> Number of times retention has been run
113+
<2> Number of times retention failed while running
114+
<3> Number of times retention hit the `slm.retention_duration` time limit and had to stop before deleting all eligible snapshots
115+
<4> Total time spent deleting snapshots by the retention process
116+
<5> Number of snapshots created by the "daily-snapshots" policy that have been deleted
117+
<6> Number of snapshots that failed to be deleted
118+
<7> Total number of snapshots deleted across all policies
119+
<8> Total number of snapshot deletion failures across all policies

0 commit comments

Comments
 (0)