@@ -6,48 +6,320 @@ Backups
6
6
7
7
.. default-domain:: mongodb
8
8
9
+ .. facet::
10
+ :name: genre
11
+ :values: reference
12
+
13
+ .. meta::
14
+ :keywords: atlas architecture center
15
+ :description: Learn the best practices for configuring backups for you Atlas cluster.
16
+
9
17
.. contents:: On this page
10
18
:local:
11
19
:backlinks: none
12
- :depth: 1
20
+ :depth: 2
13
21
:class: onecol
14
22
15
- Intro statement
23
+ |service-fullname| provides fully managed and customizable backups to
24
+ ensure data retention and recovery:
25
+
26
+ - Cloud Backups: Taken using the native snapshot capabilities of your
27
+ cloud provider, to support full-copy snapshots and localized snapshot
28
+ storage. These snapshots are always incremental in nature and leverage
29
+ the cloud provider's underlying backup snapshot mechanism for low cost
30
+ and fast restores. You choose a backup policy that specifies a certain
31
+ number of daily, weekly, and monthly snapshots.
32
+ - Continuous Cloud Backups: This is an additive feature to the cloud
33
+ backups. It captures the full oplog for a specified window, recording
34
+ all changes between snapshots, which is then replayed at the time of
35
+ restore. This enables recovery to any point in time within the window,
36
+ meeting Recovery Point Objectives (RPOs) as low as 1 minute.
37
+
38
+ We don't recommend enabling backup for development and test
39
+ environments. For staging and production environments, we recommend
40
+ developing automated deployment templates that include the
41
+ recommendations described in this page.
42
+
43
+ {+service+} Features and Recommendations for Backups
44
+ ----------------------------------------------------
45
+
46
+ Features
47
+ ~~~~~~~~
48
+
49
+ |service| provides fully-managed backups of the data, including
50
+ point-in-time data recovery and consistent, cluster-wide snapshots of
51
+ all {+clusters+}, including sharded {+clusters+}. |service| Cloud
52
+ Backups storage is separate from the |service| instances and uses the
53
+ native snapshot functionality of the {+cluster+}'s cloud service
54
+ provider.
55
+
56
+ .. list-table::
57
+ :widths: 20 80
58
+ :stub-columns: 1
59
+
60
+ * - |service| cloud backups
61
+ - This allows you to utilize the native snapshot functionality
62
+ of your {+cluster+}'s cloud service provider and store backups
63
+ separately from the your |service| instances. Benefits include a
64
+ strong default backup retention schedule of 12 months, full
65
+ flexibility to customize snapshot and retention schedules, and
66
+ the ability to set different snapshot frequencies (such as hourly
67
+ for recovery, weekly or monthly for long-term retention) to meet
68
+ industry regulations. You can access your backup data instantly,
69
+ which is useful for auditing, compliance, or data recovery
70
+ purposes and also run queries directly against the backup data,
71
+ saving time and resources.
72
+
73
+ * - Continuous cloud backups
74
+ - This provides a customizable automated backup schedule and Point
75
+ In Time (PIT) recovery, which allows you to recover back to a
76
+ any timestamp. This allows you to recover your data to the exact
77
+ moment (a point in time) right before any failure or event, like
78
+ a cyber attack. You can also set a customized restore window to
79
+ dictate how many days you would like to be able to restore back
80
+ to a specific point in time. In |service|, you can choose from
81
+ four snapshot frequencies: hourly, daily, weekly, and monthly,
82
+ each with its own retention period.
83
+
84
+ * - Multi-region snapshot distribution
85
+ - This allows you to add multiple region snapshot distribution with
86
+ |service| cloud backups and increase resilience by distributing
87
+ backup snapshots and oplogs across geographic regions instead of
88
+ just storing them in their primary region. You can meet
89
+ compliance requirements of storing backups in different, air
90
+ gapped geographical locations to ensure disaster recovery in case
91
+ of regional outages.
92
+
93
+ * - Backup compliance policy
94
+ - This feature enables you to further secure business critical data
95
+ by preventing all snapshots and oplogs stored in |service| from
96
+ being modified or deleted for a predefined retention period
97
+ specified by you, guaranteeing that your backups are fully WORM
98
+ (Write Once Read Many) compliant. Only a designated, authorized
99
+ user can turn off this protection after completing a verification
100
+ process with MongoDB support.
101
+
102
+ Recommendations for Backup Strategy
103
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
104
+
105
+ You must align your backup strategy with specific Recovery Point
106
+ Objectives (RPO) and Recovery Time Objectives (RTO) to meet business
107
+ continuity requirements, particularly for critical applications where
108
+ near-instant RPO and rapid recovery times are crucial. RPO defines the
109
+ maximum acceptable amount of data loss during an incident, while RTO
110
+ defines how quickly your application must recover. Since data varies in
111
+ importance, you must evaluate RPO and RTO for each application
112
+ individually. For example, any mission-critical data will likely have
113
+ different requirements than clickstream analytics. Your requirements
114
+ for RTO, RPO, and the backup retention period will influence the cost
115
+ and performance considerations of maintaining backups. In development
116
+ and test environments, we recommend that you disable backup to save
117
+ costs. In staging and production environments, ensure that backup is
118
+ enabled in your deployment template.
119
+
120
+ Large replica sets (and shards) take longer to restore from backup.
121
+ In staging and production environments, through testing techniques, we
122
+ recommend that you identify replica set size or shard size limits to
123
+ ensure that your size is compatible with RTO requirements. Ensure that
124
+ snapshot schedule and retention policies meet any RPO requirements.
125
+
126
+ In addition to |service| cloud backups, we recommend that you enable
127
+ continuous cloud backups with a restore window of seven days. This will
128
+ allow you to replay the Oplog to restore a {+cluster+} from a particular
129
+ point in time.
130
+
131
+ Recommendations for Backup Snapshots
132
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
133
+
134
+ |service| provides predefined backup snapshot schedules including
135
+ frequency of snapshots, and retention period. Retaining backup snapshots
136
+ for long periods can be costly. We recommend building automated
137
+ deployment templates that meet your requirements based on the size and
138
+ criticality of the data and the environment (development, test, staging,
139
+ production). For frequency and retention of snapshots, we recommend
140
+ the following:
141
+
142
+ .. list-table::
143
+ :widths: 10 15 20 45 10
144
+ :header-rows: 1
145
+
146
+ * - Tier
147
+ - RTO
148
+ - RPO
149
+ - Recommended Frequency and Retention
150
+ - Total Number of Snapshots
151
+
152
+ * - Tier 1
153
+ - 30 minutes
154
+ - Near zero (within 7 days)
155
+ - | **Hourly**: Every 12 hours, retain for 7 days = 14 snapshots
156
+ | **Daily**: Once a day, retain for 7 days = 7 snapshots
157
+ | **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
158
+ | **Monthly**: Last day of month, retain for 3 months = 6 snapshots
159
+ - 31
160
+
161
+ * - Tier 2
162
+ - 12 hours
163
+ - Near zero (within 7 days)
164
+ - | **Daily**: Once a day, retain for 7 days = 7 snapshots
165
+ | **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
166
+ | **Monthly**: Last day of month, retain for 3 months = 3 snapshots
167
+ - 14
168
+
169
+ * - Tier 3
170
+ - 3 days
171
+ - Near zero (within 2 days)
172
+ - | **Daily**: Once a day, retain for 7 days = 7 snapshots
173
+ | **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
174
+ | **Monthly**: Last day of month, retain for 3 months = 3 snapshots
175
+ - 14
176
+
177
+ We recommend Queryable Backup Snapshot options when defining automation
178
+ templates. Queryable Backups allow you to access your backup data
179
+ instantly and run queries directly against the backup data, including on
180
+ data as it existed at the time of the backup. Queryable backups also
181
+ maintain the same security and access controls as your live {+clusters+},
182
+ ensuring that sensitive data is protected. In some cases, we recommend
183
+ adjusting backup snapshot frequencies and retentions to save money if
184
+ queryable backup strategies suffice.
185
+
186
+ Recommendations for Backup Distribution
187
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16
188
17
- {+service+} Features and Best Practices for Backups
18
- ---------------------------------------------------
189
+ |service| provides options for backup locations. To further enhance
190
+ resilience, for staging and production environments only, we recommend
191
+ distributing backups in local region and to external disaster recovery
192
+ region, ensuring data recovery even during regional outages. For an
193
+ |service| {+cluster+} in three regions, multi-region Snapshot
194
+ Distribution copies backups to two secondary regions, enabling restores
195
+ in 15 minutes or less by using backup copies. By selecting only
196
+ hourly and daily snapshots along with the oplog for regional copies, you
197
+ can optimize costs while ensuring rapid recovery during regional
198
+ outages. Once restored, you can simply point your application to the new
199
+ {+cluster+} to regain full functionality with complete read and write
200
+ capabilities. We recommend developing automated deployment templates
201
+ that strike a balance between availability and cost. However, your
202
+ critical workloads might require multiple copies of snapshots in various
203
+ locations.
19
204
20
- Content here
205
+ Recommendations for Backup Compliance Policy
206
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
207
+
208
+ We recommend enforcing |service|'s Backup Compliance Policy to prevent
209
+ unauthorized modifications or deletions of backups, thereby maintaining
210
+ data integrity and supporting robust disaster recovery.
211
+
212
+ Recommendations for PIT Recovery
213
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
214
+
215
+ |service|'s fully managed, customizable Cloud Backups and Continuous
216
+ Cloud Backups enable precise Point In Time (PIT) recovery, minimizing
217
+ data loss during failures. |service| can quickly recover to the exact
218
+ timestamp before a failure event, giving you at least a one minute
219
+ :abbr:`RPO (Recovery Point Objective)` and an :abbr:`RTO (Recovery Time
220
+ Objective)` of less than 15 minutes when utilizing optimized restores,
221
+ even in the event of the outage of the primary region. Recovery times
222
+ can vary due to cloud provider disk warming and which point in time you
223
+ are restoring to. If you can be flexible in your requirements for
224
+ recovery, we recommend designing templates that identify the sweet spot
225
+ between reasonable recovery options and cost.
226
+
227
+ Recommendations for Backup Costs
228
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
229
+
230
+ To optimize |service| backup costs, you must adjust the backup frequency
231
+ and retention policies to align with data criticality, reducing
232
+ unnecessary storage expenses. You can also use incremental backups and
233
+ built-in compression to minimize the amount of stored data. By selecting
234
+ regions strategically for backup, you can avoid cross-region data
235
+ transfer fees and choose the right cluster size based on workload to
236
+ prevent overspending. By implementing these strategies, you can
237
+ effectively manage costs while maintaining secure and reliable backups.
21
238
22
239
Examples
23
240
--------
24
241
25
- The following examples <perform this action> using |service|
242
+ The following examples enable backup and restore operations using |service|
26
243
:ref:`tools for automation <arch-center-automation>`.
27
244
28
- These examples also apply other recommended configurations, including:
245
+ These examples apply only for staging and production environments where
246
+ backup is enabled for the {+cluster+}.
29
247
30
248
.. tabs::
31
249
32
- .. tab:: Dev and Test Environments
33
- :tabid: devtest
250
+ .. tab:: CLI
251
+ :tabid: cli
34
252
35
- .. include:: /includes/shared-settings-clusters-devtest.rst
253
+ Run the following command take a backup snapshot for the {+cluster+}
254
+ named myDemo and retain the snapshot for 7 days:
36
255
37
- .. tab:: Staging and Prod Environments
38
- :tabid: stagingprod
256
+ .. include:: /includes/examples/cli-example-backup-take-snapshot.rst
39
257
40
- .. include:: /includes/shared-settings-clusters-stagingprod.rst
258
+ Enable backup compliance policy for your project with a
259
+ designated, authorized user named ``john doe`` who alone can turn
260
+ off this protection after completing a verification process with
261
+ MongoDB support.
41
262
42
- .. tabs::
263
+ .. include:: /includes/examples/cli-example-backup-compliance-policy-enable.rst
43
264
44
- .. tab:: CLI
45
- :tabid: cli
265
+ Run the following command to create a compliance policy for
266
+ scheduled backup snapshots that enforces the number of times
267
+ (``2``) snapshots must be taken every month and the duration
268
+ (``2`` months) for retaining the snapshots.
46
269
47
- Content here
270
+ .. include:: /includes/examples/cli-example-backup-compliance-policy-schedule.rst
48
271
49
272
.. tab:: Terraform
50
273
:tabid: Terraform
274
+
275
+ The following examples demonstrate how to configure backups during
276
+ deployment. Before you can create resources with Terraform,
277
+ you must:
278
+
279
+ - :ref:`Create your paying organization <configure-paying-org>`
280
+ and :ref:`create an API key <atlas-admin-api-access>` for the
281
+ paying organization. Store your API key as environment variables
282
+ by running the following command in the terminal:
283
+
284
+ .. code-block::
285
+
286
+ export MONGODB_ATLAS_PUBLIC_KEY="<insert your public key here>"
287
+ export MONGODB_ATLAS_PRIVATE_KEY="<insert your private key here>"
288
+
289
+ - `Install Terraform <https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli>`__.
290
+
291
+ Common Files
292
+ ~~~~~~~~~~~~
293
+
294
+ You must create the following files for each example. Place the
295
+ files for each example in their own directory. Change the IDs and
296
+ names to use your values. Then run the commands to initialize
297
+ Terraform, view the Terraform plan, and apply the changes.
298
+
299
+ variables.tf
300
+ ````````````
301
+
302
+ .. include:: /includes/examples/tf-example-backup-variables.rst
303
+
304
+
305
+ Configure Backup Schedule for the {+Cluster+}
306
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
307
+
308
+ Use the following to configure a backup schedule for the
309
+ {+cluster+}.
310
+
311
+ main.tf
312
+ ```````
313
+
314
+ .. include:: /includes/examples/tf-example-backup-snapshot-schedule.rst
315
+
316
+ Configure Backup and PIT Restore for the {+Cluster+}
317
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
51
318
52
- Content here
319
+ Use the following to configure cloud backup snapshot and PIT restore
320
+ job.
53
321
322
+ main.tf
323
+ ```````
324
+
325
+ .. include:: /includes/examples/tf-example-backup-snapshot-pit-restore.rst
0 commit comments