@@ -6,48 +6,320 @@ Backups
66
77.. default-domain:: mongodb
88
9+ .. facet::
10+    :name: genre
11+    :values: reference
12+ 
13+ .. meta::
14+    :keywords: atlas architecture center
15+    :description: Learn the best practices for configuring backups for you Atlas cluster.
16+ 
917.. contents:: On this page
1018   :local:
1119   :backlinks: none
12-    :depth: 1 
20+    :depth: 2 
1321   :class: onecol
1422
15- Intro statement
23+ |service-fullname| provides fully managed and customizable backups to
24+ ensure data retention and recovery: 
25+ 
26+ - Cloud Backups: Taken using the native snapshot capabilities of your
27+   cloud provider, to support full-copy snapshots and localized snapshot
28+   storage. These snapshots are always incremental in nature and leverage
29+   the cloud provider's underlying backup snapshot mechanism for low cost
30+   and fast restores. You choose a backup policy that specifies a certain
31+   number of daily, weekly, and monthly snapshots.
32+ - Continuous Cloud Backups: This is an additive feature to the cloud
33+   backups. It captures the full oplog for a specified window, recording 
34+   all changes between snapshots, which is then replayed at the time of
35+   restore. This enables recovery to any point in time within the window,
36+   meeting Recovery Point Objectives (RPOs) as low as 1 minute.  
37+ 
38+ We don't recommend enabling backup for development and test
39+ environments. For staging and production environments, we recommend
40+ developing automated deployment templates that include the
41+ recommendations described in this page. 
42+ 
43+ {+service+} Features and Recommendations for Backups
44+ ----------------------------------------------------
45+ 
46+ Features
47+ ~~~~~~~~
48+ 
49+ |service| provides fully-managed backups of the data, including
50+ point-in-time data recovery and consistent, cluster-wide snapshots of
51+ all {+clusters+}, including sharded {+clusters+}. |service| Cloud
52+ Backups storage is separate from the |service| instances and uses the
53+ native snapshot functionality of the {+cluster+}'s cloud service
54+ provider.  
55+ 
56+ .. list-table:: 
57+    :widths: 20 80 
58+    :stub-columns: 1
59+ 
60+    * - |service| cloud backups
61+      - This allows you to utilize the native snapshot functionality
62+        of your {+cluster+}'s cloud service provider and store backups
63+        separately from the your |service| instances. Benefits include a
64+        strong default backup retention schedule of 12 months, full
65+        flexibility to customize snapshot and retention schedules, and
66+        the ability to set different snapshot frequencies (such as hourly
67+        for recovery, weekly or monthly for long-term retention) to meet
68+        industry regulations. You can access your backup data instantly,
69+        which is useful for auditing, compliance, or data recovery
70+        purposes and also run queries directly against the backup data,
71+        saving time and resources.  
72+ 
73+    * - Continuous cloud backups
74+      - This provides a customizable automated backup schedule and Point
75+        In Time (PIT) recovery, which allows you to recover back to a
76+        any timestamp. This allows you to recover your data to the exact
77+        moment (a point in time) right before any failure or event, like
78+        a cyber attack. You can also set a customized restore window to
79+        dictate how many days you  would like to be able to restore back
80+        to a specific point in time. In |service|, you can choose from
81+        four snapshot frequencies: hourly, daily, weekly, and monthly,
82+        each with its own retention period. 
83+ 
84+    * - Multi-region snapshot distribution
85+      - This allows you to add multiple region snapshot distribution with
86+        |service| cloud backups and increase resilience by distributing
87+        backup snapshots and oplogs across geographic regions instead of
88+        just storing them in their primary region. You can meet
89+        compliance requirements of storing backups in different, air
90+        gapped geographical locations to ensure disaster recovery in case
91+        of regional outages. 
92+ 
93+    * - Backup compliance policy 
94+      - This feature enables you to further secure business critical data
95+        by preventing all snapshots and oplogs stored in |service| from
96+        being modified or deleted for a predefined retention period
97+        specified by you, guaranteeing that your backups are fully WORM
98+        (Write Once Read Many) compliant. Only a designated, authorized
99+        user can turn off this protection after completing a verification
100+        process with MongoDB support.  
101+ 
102+ Recommendations for Backup Strategy
103+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
104+ 
105+ You must align your backup strategy with specific Recovery Point
106+ Objectives (RPO) and Recovery Time Objectives (RTO) to meet business
107+ continuity requirements, particularly for critical applications where
108+ near-instant RPO and rapid recovery times are crucial. RPO defines the
109+ maximum acceptable amount of data loss during an incident, while RTO 
110+ defines how quickly your application must recover. Since data varies in
111+ importance, you must evaluate RPO and RTO for each application
112+ individually. For example, any mission-critical data will likely have 
113+ different requirements than clickstream analytics. Your requirements
114+ for RTO, RPO, and the backup retention period will influence the cost
115+ and performance considerations of maintaining backups. In development
116+ and test environments, we recommend that you disable backup to save
117+ costs. In staging and production environments, ensure that backup is
118+ enabled in your deployment template.
119+ 
120+ Large replica sets (and shards) take longer to restore from backup. 
121+ In staging and production environments, through testing techniques, we
122+ recommend that you identify replica set size or shard size limits to
123+ ensure that your size is compatible with RTO requirements. Ensure that 
124+ snapshot schedule and retention policies meet any RPO requirements. 
125+ 
126+ In addition to |service| cloud backups, we recommend that you enable
127+ continuous cloud backups with a restore window of seven days. This will
128+ allow you to replay the Oplog to restore a {+cluster+} from a particular
129+ point in time. 
130+ 
131+ Recommendations for Backup Snapshots 
132+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
133+ 
134+ |service| provides predefined backup snapshot schedules including
135+ frequency of snapshots, and retention period. Retaining backup snapshots
136+ for long periods can be costly. We recommend building automated
137+ deployment templates that meet your requirements based on the size and
138+ criticality of the data and the environment (development, test, staging,
139+ production). For frequency and retention of snapshots, we recommend
140+ the following: 
141+ 
142+ .. list-table:: 
143+    :widths: 10 15 20 45 10
144+    :header-rows: 1
145+ 
146+    * - Tier
147+      - RTO 
148+      - RPO
149+      - Recommended Frequency and Retention
150+      - Total Number of Snapshots
151+ 
152+    * - Tier 1 
153+      - 30 minutes 
154+      - Near zero (within 7 days)
155+      - | **Hourly**: Every 12 hours, retain for 7 days = 14 snapshots
156+        | **Daily**: Once a day, retain for 7 days = 7 snapshots
157+        | **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
158+        | **Monthly**: Last day of month, retain for 3 months = 6 snapshots
159+      - 31
160+ 
161+    * - Tier 2 
162+      - 12 hours 
163+      - Near zero (within 7 days)
164+      - | **Daily**: Once a day, retain for 7 days = 7 snapshots
165+        | **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
166+        | **Monthly**: Last day of month, retain for 3 months = 3 snapshots
167+      - 14
168+ 
169+    * - Tier 3 
170+      - 3 days 
171+      - Near zero (within 2 days)
172+      - | **Daily**: Once a day, retain for 7 days = 7 snapshots
173+        | **Weekly**: Saturday, retain for 4 weeks = 4 snapshots
174+        | **Monthly**: Last day of month, retain for 3 months = 3 snapshots
175+      - 14
176+ 
177+ We recommend Queryable Backup Snapshot options when defining automation
178+ templates. Queryable Backups allow you to access your backup data
179+ instantly and run queries directly against the backup data, including on
180+ data as it existed at the time of the backup. Queryable backups also
181+ maintain the same security and access controls as your live {+clusters+},
182+ ensuring that sensitive data is protected. In some cases, we recommend
183+ adjusting backup snapshot frequencies and retentions to save money if
184+ queryable backup strategies suffice.  
185+ 
186+ Recommendations for Backup Distribution 
187+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
16188
17- {+service+} Features and Best Practices for Backups
18- ---------------------------------------------------
189+ |service| provides options for backup locations. To further enhance
190+ resilience, for staging and production environments only, we recommend
191+ distributing backups in local region and to external disaster recovery
192+ region, ensuring data recovery even during regional outages. For an
193+ |service| {+cluster+} in three regions, multi-region Snapshot
194+ Distribution copies backups to two secondary regions, enabling restores
195+ in 15 minutes or less by using backup copies. By selecting only 
196+ hourly and daily snapshots along with the oplog for regional copies, you
197+ can optimize costs while ensuring rapid recovery during regional
198+ outages. Once restored, you can simply point your application to the new 
199+ {+cluster+} to regain full functionality with complete read and write
200+ capabilities. We recommend developing automated deployment templates
201+ that strike a balance between availability and cost. However, your
202+ critical workloads might require multiple copies of snapshots in various
203+ locations.
19204
20- Content here
205+ Recommendations for Backup Compliance Policy 
206+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
207+ 
208+ We recommend enforcing |service|'s Backup Compliance Policy to prevent
209+ unauthorized modifications or deletions of backups, thereby maintaining
210+ data integrity and supporting robust disaster recovery.  
211+ 
212+ Recommendations for PIT Recovery
213+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
214+ 
215+ |service|'s fully managed, customizable Cloud Backups and Continuous
216+ Cloud Backups enable precise Point In Time (PIT) recovery, minimizing
217+ data loss during failures. |service| can quickly recover to the exact
218+ timestamp before a failure event, giving you at least a one minute
219+ :abbr:`RPO (Recovery Point Objective)` and an :abbr:`RTO (Recovery Time
220+ Objective)` of less than 15 minutes when utilizing optimized restores,
221+ even in the event of the outage of the primary region. Recovery times
222+ can vary due to cloud provider disk warming and which point in time you
223+ are restoring to. If you can be flexible in your requirements for
224+ recovery, we recommend designing templates that identify the sweet spot
225+ between reasonable recovery options and cost. 
226+ 
227+ Recommendations for Backup Costs 
228+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
229+ 
230+ To optimize |service| backup costs, you must adjust the backup frequency
231+ and retention policies to align with data criticality, reducing
232+ unnecessary storage expenses. You can also use incremental backups and
233+ built-in compression to minimize the amount of stored data. By selecting
234+ regions strategically for backup, you can avoid cross-region data
235+ transfer fees and choose the right cluster size based on workload to
236+ prevent overspending. By implementing these strategies, you can
237+ effectively manage costs while maintaining secure and reliable backups. 
21238
22239Examples
23240--------
24241
25- The following examples <perform this action>  using |service|
242+ The following examples enable backup and restore operations  using |service|
26243:ref:`tools for automation <arch-center-automation>`.
27244
28- These examples also apply other recommended configurations, including:
245+ These examples apply only for staging and production environments where
246+ backup is enabled for the {+cluster+}.
29247
30248.. tabs::
31249
32-    .. tab:: Dev and Test Environments 
33-       :tabid: devtest 
250+    .. tab:: CLI 
251+       :tabid: cli 
34252
35-       .. include:: /includes/shared-settings-clusters-devtest.rst
253+       Run the following command take a backup snapshot for the {+cluster+}
254+       named myDemo and retain the snapshot for 7 days:
36255
37-    .. tab:: Staging and Prod Environments
38-       :tabid: stagingprod
256+       .. include:: /includes/examples/cli-example-backup-take-snapshot.rst
39257
40-       .. include:: /includes/shared-settings-clusters-stagingprod.rst
258+       Enable backup compliance policy for your project with a
259+       designated, authorized user named ``john doe`` who alone can turn
260+       off this protection after completing a verification process with
261+       MongoDB support. 
41262
42- .. tabs:: 
263+        .. include:: /includes/examples/cli-example-backup-compliance-policy-enable.rst 
43264
44-    .. tab:: CLI
45-       :tabid: cli
265+       Run the following command to create a compliance policy for
266+       scheduled backup snapshots that enforces the number of times
267+       (``2``) snapshots must be taken every month and the duration
268+       (``2`` months) for retaining the snapshots. 
46269
47-       Content here 
270+       .. include:: /includes/examples/cli-example-backup-compliance-policy-schedule.rst 
48271
49272   .. tab:: Terraform
50273      :tabid: Terraform
274+  
275+       The following examples demonstrate how to configure backups during
276+       deployment. Before you can create resources with Terraform,
277+       you must:  
278+ 
279+       - :ref:`Create your paying organization <configure-paying-org>`
280+         and :ref:`create an API key <atlas-admin-api-access>` for the
281+         paying organization. Store your API key as environment variables
282+         by running the following command in the terminal: 
283+ 
284+         .. code-block::
285+ 
286+            export MONGODB_ATLAS_PUBLIC_KEY="<insert your public key here>"
287+            export MONGODB_ATLAS_PRIVATE_KEY="<insert your private key here>"
288+ 
289+       - `Install Terraform <https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli>`__.
290+ 
291+       Common Files 
292+       ~~~~~~~~~~~~
293+ 
294+       You must create the following files for each example. Place the
295+       files for each example in their own directory. Change the IDs and
296+       names to use your values. Then run the commands to initialize
297+       Terraform, view the Terraform plan, and apply the changes. 
298+ 
299+       variables.tf 
300+       ````````````
301+ 
302+       .. include:: /includes/examples/tf-example-backup-variables.rst
303+ 
304+ 
305+       Configure Backup Schedule for the {+Cluster+} 
306+       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
307+ 
308+       Use the following to configure a backup schedule for the
309+       {+cluster+}. 
310+ 
311+       main.tf 
312+       ```````
313+ 
314+       .. include:: /includes/examples/tf-example-backup-snapshot-schedule.rst
315+ 
316+       Configure Backup and PIT Restore for the {+Cluster+}
317+       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
51318
52-       Content here
319+       Use the following to configure cloud backup snapshot and PIT restore
320+       job.
53321
322+       main.tf 
323+       ```````
324+       
325+       .. include:: /includes/examples/tf-example-backup-snapshot-pit-restore.rst
0 commit comments