Skip to content

Commit 07e1168

Browse files
DOCSP-43349 Backups doc (#25)
* DOCSP-43349 Backups doc * Apply suggestions from code review Co-authored-by: Sarah Simpers <[email protected]> * DOCSP-43349 updates for SS feedback * DOCSP-43349 updates for LA's feedback --------- Co-authored-by: Sarah Simpers <[email protected]>
1 parent 0de3b81 commit 07e1168

7 files changed

+482
-18
lines changed

source/backups.txt

Lines changed: 290 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,48 +6,320 @@ Backups
66

77
.. default-domain:: mongodb
88

9+
.. facet::
10+
:name: genre
11+
:values: reference
12+
13+
.. meta::
14+
:keywords: atlas architecture center
15+
:description: Learn the best practices for configuring backups for you Atlas cluster.
16+
917
.. contents:: On this page
1018
:local:
1119
:backlinks: none
12-
:depth: 1
20+
:depth: 2
1321
:class: onecol
1422

15-
Intro statement
23+
|service-fullname| provides fully managed and customizable backups to
24+
ensure data retention and recovery:
25+
26+
- Cloud Backups: Taken using the native snapshot capabilities of your
27+
cloud provider, to support full-copy snapshots and localized snapshot
28+
storage. These snapshots are always incremental in nature and leverage
29+
the cloud provider's underlying backup snapshot mechanism for low cost
30+
and fast restores. You choose a backup policy that specifies a certain
31+
number of daily, weekly, and monthly snapshots.
32+
- Continuous Cloud Backups: This is an additive feature to the cloud
33+
backups. It captures the full oplog for a specified window, recording
34+
all changes between snapshots, which is then replayed at the time of
35+
restore. This enables recovery to any point in time within the window,
36+
meeting Recovery Point Objectives (RPOs) as low as 1 minute.
37+
38+
We don't recommend enabling backup for development and test
39+
environments. For staging and production environments, we recommend
40+
developing automated deployment templates that include the
41+
recommendations described in this page.
42+
43+
{+service+} Features and Recommendations for Backups
44+
----------------------------------------------------
45+
46+
Features
47+
~~~~~~~~
48+
49+
|service| provides fully-managed backups of the data, including
50+
point-in-time data recovery and consistent, cluster-wide snapshots of
51+
all {+clusters+}, including sharded {+clusters+}. |service| Cloud
52+
Backups storage is separate from the |service| instances and uses the
53+
native snapshot functionality of the {+cluster+}'s cloud service
54+
provider.
55+
56+
.. list-table::
57+
:widths: 20 80
58+
:stub-columns: 1
59+
60+
* - |service| cloud backups
61+
- This allows you to utilize the native snapshot functionality
62+
of your {+cluster+}'s cloud service provider and store backups
63+
separately from the your |service| instances. Benefits include a
64+
strong default backup retention schedule of 12 months, full
65+
flexibility to customize snapshot and retention schedules, and
66+
the ability to set different snapshot frequencies (such as hourly
67+
for recovery, weekly or monthly for long-term retention) to meet
68+
industry regulations. You can access your backup data instantly,
69+
which is useful for auditing, compliance, or data recovery
70+
purposes and also run queries directly against the backup data,
71+
saving time and resources.
72+
73+
* - Continuous cloud backups
74+
- This provides a customizable automated backup schedule and Point
75+
In Time (PIT) recovery, which allows you to recover back to a
76+
any timestamp. This allows you to recover your data to the exact
77+
moment (a point in time) right before any failure or event, like
78+
a cyber attack. You can also set a customized restore window to
79+
dictate how many days you would like to be able to restore back
80+
to a specific point in time. In |service|, you can choose from
81+
four snapshot frequencies: hourly, daily, weekly, and monthly,
82+
each with its own retention period.
83+
84+
* - Multi-region snapshot distribution
85+
- This allows you to add multiple region snapshot distribution with
86+
|service| cloud backups and increase resilience by distributing
87+
backup snapshots and oplogs across geographic regions instead of
88+
just storing them in their primary region. You can meet
89+
compliance requirements of storing backups in different, air
90+
gapped geographical locations to ensure disaster recovery in case
91+
of regional outages.
92+
93+
* - Backup compliance policy
94+
- This feature enables you to further secure business critical data
95+
by preventing all snapshots and oplogs stored in |service| from
96+
being modified or deleted for a predefined retention period
97+
specified by you, guaranteeing that your backups are fully WORM
98+
(Write Once Read Many) compliant. Only a designated, authorized
99+
user can turn off this protection after completing a verification
100+
process with MongoDB support.
101+
102+
Recommendations for Backup Strategy
103+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
104+
105+
You must align your backup strategy with specific Recovery Point
106+
Objectives (RPO) and Recovery Time Objectives (RTO) to meet business
107+
continuity requirements, particularly for critical applications where
108+
near-instant RPO and rapid recovery times are crucial. RPO defines the
109+
maximum acceptable amount of data loss during an incident, while RTO
110+
defines how quickly your application must recover. Since data varies in
111+
importance, you must evaluate RPO and RTO for each application
112+
individually. For example, any mission-critical data will likely have
113+
different requirements than clickstream analytics. Your requirements
114+
for RTO, RPO, and the backup retention period will influence the cost
115+
and performance considerations of maintaining backups. In development
116+
and test environments, we recommend that you disable backup to save
117+
costs. In staging and production environments, ensure that backup is
118+
enabled in your deployment template.
119+
120+
Large replica sets (and shards) take longer to restore from backup.
121+
In staging and production environments, through testing techniques, we
122+
recommend that you identify replica set size or shard size limits to
123+
ensure that your size is compatible with RTO requirements. Ensure that
124+
snapshot schedule and retention policies meet any RPO requirements.
125+
126+
In addition to |service| cloud backups, we recommend that you enable
127+
continuous cloud backups with a restore window of seven days. This will
128+
allow you to replay the Oplog to restore a {+cluster+} from a particular
129+
point in time.
130+
131+
Recommendations for Backup Snapshots
132+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
133+
134+
|service| provides predefined backup snapshot schedules including
135+
frequency of snapshots, and retention period. Retaining backup snapshots
136+
for long periods can be costly. We recommend building automated
137+
deployment templates that meet your requirements based on the size and
138+
criticality of the data and the environment (development, test, staging,
139+
production). For frequency and retention of snapshots, we recommend
140+
the following:
141+
142+
.. list-table::
143+
:widths: 10 15 20 45 10
144+
:header-rows: 1
145+
146+
* - Tier
147+
- RTO
148+
- RPO
149+
- Recommended Frequency and Retention
150+
- Total Number of Snapshots
151+
152+
* - Tier 1
153+
- 30 minutes
154+
- Near zero (within 7 days)
155+
- | **Hourly**: Every 12 hours, retain for 7 days = 14 snapshots
156+
| **Daily**: Once a day, retain for 7 days = 7 snapshots
157+
| **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
158+
| **Monthly**: Last day of month, retain for 3 months = 6 snapshots
159+
- 31
160+
161+
* - Tier 2
162+
- 12 hours
163+
- Near zero (within 7 days)
164+
- | **Daily**: Once a day, retain for 7 days = 7 snapshots
165+
| **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
166+
| **Monthly**: Last day of month, retain for 3 months = 3 snapshots
167+
- 14
168+
169+
* - Tier 3
170+
- 3 days
171+
- Near zero (within 2 days)
172+
- | **Daily**: Once a day, retain for 7 days = 7 snapshots
173+
| **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
174+
| **Monthly**: Last day of month, retain for 3 months = 3 snapshots
175+
- 14
176+
177+
We recommend Queryable Backup Snapshot options when defining automation
178+
templates. Queryable Backups allow you to access your backup data
179+
instantly and run queries directly against the backup data, including on
180+
data as it existed at the time of the backup. Queryable backups also
181+
maintain the same security and access controls as your live {+clusters+},
182+
ensuring that sensitive data is protected. In some cases, we recommend
183+
adjusting backup snapshot frequencies and retentions to save money if
184+
queryable backup strategies suffice.
185+
186+
Recommendations for Backup Distribution
187+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16188

17-
{+service+} Features and Best Practices for Backups
18-
---------------------------------------------------
189+
|service| provides options for backup locations. To further enhance
190+
resilience, for staging and production environments only, we recommend
191+
distributing backups in local region and to external disaster recovery
192+
region, ensuring data recovery even during regional outages. For an
193+
|service| {+cluster+} in three regions, multi-region Snapshot
194+
Distribution copies backups to two secondary regions, enabling restores
195+
in 15 minutes or less by using backup copies. By selecting only
196+
hourly and daily snapshots along with the oplog for regional copies, you
197+
can optimize costs while ensuring rapid recovery during regional
198+
outages. Once restored, you can simply point your application to the new
199+
{+cluster+} to regain full functionality with complete read and write
200+
capabilities. We recommend developing automated deployment templates
201+
that strike a balance between availability and cost. However, your
202+
critical workloads might require multiple copies of snapshots in various
203+
locations.
19204

20-
Content here
205+
Recommendations for Backup Compliance Policy
206+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
207+
208+
We recommend enforcing |service|'s Backup Compliance Policy to prevent
209+
unauthorized modifications or deletions of backups, thereby maintaining
210+
data integrity and supporting robust disaster recovery.
211+
212+
Recommendations for PIT Recovery
213+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
214+
215+
|service|'s fully managed, customizable Cloud Backups and Continuous
216+
Cloud Backups enable precise Point In Time (PIT) recovery, minimizing
217+
data loss during failures. |service| can quickly recover to the exact
218+
timestamp before a failure event, giving you at least a one minute
219+
:abbr:`RPO (Recovery Point Objective)` and an :abbr:`RTO (Recovery Time
220+
Objective)` of less than 15 minutes when utilizing optimized restores,
221+
even in the event of the outage of the primary region. Recovery times
222+
can vary due to cloud provider disk warming and which point in time you
223+
are restoring to. If you can be flexible in your requirements for
224+
recovery, we recommend designing templates that identify the sweet spot
225+
between reasonable recovery options and cost.
226+
227+
Recommendations for Backup Costs
228+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
229+
230+
To optimize |service| backup costs, you must adjust the backup frequency
231+
and retention policies to align with data criticality, reducing
232+
unnecessary storage expenses. You can also use incremental backups and
233+
built-in compression to minimize the amount of stored data. By selecting
234+
regions strategically for backup, you can avoid cross-region data
235+
transfer fees and choose the right cluster size based on workload to
236+
prevent overspending. By implementing these strategies, you can
237+
effectively manage costs while maintaining secure and reliable backups.
21238

22239
Examples
23240
--------
24241

25-
The following examples <perform this action> using |service|
242+
The following examples enable backup and restore operations using |service|
26243
:ref:`tools for automation <arch-center-automation>`.
27244

28-
These examples also apply other recommended configurations, including:
245+
These examples apply only for staging and production environments where
246+
backup is enabled for the {+cluster+}.
29247

30248
.. tabs::
31249

32-
.. tab:: Dev and Test Environments
33-
:tabid: devtest
250+
.. tab:: CLI
251+
:tabid: cli
34252

35-
.. include:: /includes/shared-settings-clusters-devtest.rst
253+
Run the following command take a backup snapshot for the {+cluster+}
254+
named myDemo and retain the snapshot for 7 days:
36255

37-
.. tab:: Staging and Prod Environments
38-
:tabid: stagingprod
256+
.. include:: /includes/examples/cli-example-backup-take-snapshot.rst
39257

40-
.. include:: /includes/shared-settings-clusters-stagingprod.rst
258+
Enable backup compliance policy for your project with a
259+
designated, authorized user named ``john doe`` who alone can turn
260+
off this protection after completing a verification process with
261+
MongoDB support.
41262

42-
.. tabs::
263+
.. include:: /includes/examples/cli-example-backup-compliance-policy-enable.rst
43264

44-
.. tab:: CLI
45-
:tabid: cli
265+
Run the following command to create a compliance policy for
266+
scheduled backup snapshots that enforces the number of times
267+
(``2``) snapshots must be taken every month and the duration
268+
(``2`` months) for retaining the snapshots.
46269

47-
Content here
270+
.. include:: /includes/examples/cli-example-backup-compliance-policy-schedule.rst
48271

49272
.. tab:: Terraform
50273
:tabid: Terraform
274+
275+
The following examples demonstrate how to configure backups during
276+
deployment. Before you can create resources with Terraform,
277+
you must:
278+
279+
- :ref:`Create your paying organization <configure-paying-org>`
280+
and :ref:`create an API key <atlas-admin-api-access>` for the
281+
paying organization. Store your API key as environment variables
282+
by running the following command in the terminal:
283+
284+
.. code-block::
285+
286+
export MONGODB_ATLAS_PUBLIC_KEY="<insert your public key here>"
287+
export MONGODB_ATLAS_PRIVATE_KEY="<insert your private key here>"
288+
289+
- `Install Terraform <https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli>`__.
290+
291+
Common Files
292+
~~~~~~~~~~~~
293+
294+
You must create the following files for each example. Place the
295+
files for each example in their own directory. Change the IDs and
296+
names to use your values. Then run the commands to initialize
297+
Terraform, view the Terraform plan, and apply the changes.
298+
299+
variables.tf
300+
````````````
301+
302+
.. include:: /includes/examples/tf-example-backup-variables.rst
303+
304+
305+
Configure Backup Schedule for the {+Cluster+}
306+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
307+
308+
Use the following to configure a backup schedule for the
309+
{+cluster+}.
310+
311+
main.tf
312+
```````
313+
314+
.. include:: /includes/examples/tf-example-backup-snapshot-schedule.rst
315+
316+
Configure Backup and PIT Restore for the {+Cluster+}
317+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
51318

52-
Content here
319+
Use the following to configure cloud backup snapshot and PIT restore
320+
job.
53321

322+
main.tf
323+
```````
324+
325+
.. include:: /includes/examples/tf-example-backup-snapshot-pit-restore.rst
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.. code-block:: shell
2+
:copyable: true
3+
4+
atlas backups compliancePolicy enable \
5+
--projectId 67212db237c5766221eb6ad9 \
6+
--authorizedEmail [email protected] \
7+
--authorizedUserFirstName john \
8+
--authorizedUserLastName doe
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
.. code-block:: shell
2+
:copyable: true
3+
4+
atlas backups compliancePolicy policies scheduled create \
5+
--projectId 67212db237c5766221eb6ad9 \
6+
--frequencyInterval 2 \
7+
--frequencyType monthly \
8+
--retentionValue 2 \
9+
--retentionUnit months
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.. code-block:: shell
2+
:copyable: true
3+
4+
atlas backups snapshots create myDemo --desc "my backup snapshot" --retention 7

0 commit comments

Comments
 (0)