Skip to content

Commit c7e1ff0

Browse files
DOCSP-24408 doc for On Demand Snapshot Support (#269)
* DOCSP-24408 doc for On Demand Snapshot Support * DOCSP-24408 updates for feedback * DOCSP-24408 changes for using ingest instead of extract * Apply suggestions from code review CR feedback Co-authored-by: sarahsimpers <[email protected]> * DOCSP-24408 updates to doc * DOCSP-24408 resolve merge conflicts Co-authored-by: sarahsimpers <[email protected]>
1 parent 8a37c4e commit c7e1ff0

8 files changed

+204
-56
lines changed
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
.. _edit-dataset-pipeline:
2+
3+
================================
4+
Edit an {+adl+} Pipeline
5+
================================
6+
7+
.. default-domain:: mongodb
8+
9+
.. contents:: On this page
10+
:local:
11+
:backlinks: none
12+
:depth: 1
13+
:class: singlecol
14+
15+
.. default-domain:: mongodb
16+
17+
You can make changes to your {+dl+} pipelines through the |service| UI, including:
18+
19+
- Edit the data extraction schedule
20+
- Edit the data storage region
21+
- Change the fields to exclude from your {+dl+} datasets
22+
23+
24+
Procedure
25+
---------
26+
27+
.. procedure::
28+
:style: normal
29+
30+
.. step:: Log in to `MongoDB Atlas <https://cloud.mongodb.com>`__.
31+
32+
.. step:: Select :guilabel:`Data Lake` under :guilabel:`Deployment` on the left-hand navigation.
33+
34+
.. step:: Click :icon-fa5:`pencil-alt` in the :guilabel:`Actions` column for the pipeline that you wish to modify.
35+
36+
.. step:: (Optional) Make changes to your data extraction schedule.
37+
38+
Before making changes to your :guilabel:`Basic Schedule`, ensure
39+
that your desired data extraction frequency is similar to your
40+
current backup schedule. For example, if you wish to switch to
41+
``Daily``, you must have a ``Daily`` backup schedule configured
42+
in your policy. Or, if you want to switch to a schedule of once a
43+
week, you must have a ``Weekly`` backup schedule configured in
44+
your policy. To learn more, see :atlas:`Backup Scheduling
45+
</backup/cloud-backup/overview/#backup-scheduling--retention--and-on-demand-backup-snapshots>`.
46+
47+
.. step:: (Optional) Make changes to your data storage region.
48+
49+
{+adl+} provides optimized storage in the following |aws| regions:
50+
51+
.. include:: /includes/list-table-supported-aws-regions.rst
52+
53+
.. step:: Click :guilabel:`Continue`.
54+
55+
.. step:: (Optional) Make changes to the fields excluded from your {+dl+} datasets.
56+
57+
- Click :guilabel:`Add Field` and specify :guilabel:`Field Name`
58+
to add fields to the excluded fields list.
59+
60+
- Click :guilabel:`Delete All` to remove all the fields from the
61+
excluded fields list.
62+
63+
- Click :icon:`trash-alt` next to a field to remove that
64+
field from the excluded fields list.
65+
66+
.. step:: Click :guilabel:`Review Changes` to review the changes to your pipeline.
67+
68+
.. step:: Click :guilabel:`Apply Changes` for the changes to take effect.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
.. _ingest-on-demand:
2+
3+
================================
4+
Trigger Data Ingestion On Demand
5+
================================
6+
7+
.. default-domain:: mongodb
8+
9+
You can manually trigger an ingestion of snapshot data from the
10+
|service| cluster to {+adl+} datasets if you configured :guilabel:`On
11+
Demand` extraction in your {+dl+} pipeline.
12+
13+
Procedure
14+
---------
15+
16+
.. procedure::
17+
:style: normal
18+
19+
.. step:: Log in to `MongoDB Atlas <https://cloud.mongodb.com>`__.
20+
21+
.. step:: Select :guilabel:`Data Lake` under :guilabel:`Deployment` on the left-hand navigation.
22+
23+
.. step:: Click the vertical ellipsis (:icon-fa4:`ellipsis-v`) for the {+dl+} for which you configured :guilabel:`On Demand` ingestion and select :guilabel:`Trigger an On Demand Pipeline Run`.
24+
25+
.. step:: Select the snapshot, from which to ingest data, from the dropdown.
26+
27+
The dropdown shows a list of all the snapshots on your |service|
28+
cluster. However, you can select only the snapshots from which
29+
{+dl+} hasn't yet ingested data; the grayed-out snapshots are
30+
snapshots from which your {+dl+} has already ingested data.
31+
32+
.. step:: Click :guilabel:`Confirm`.
33+
34+
|service| displays a blue banner at the top of the page that
35+
shows the data ingestion status.

source/administration/pause-resume-data-extraction.txt

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _pause-resume-data-ingestion:
22

33
=================================================
4-
Pause Data Extraction for Your {+dl+} Pipeline
4+
Pause Data Ingestion for Your {+dl+} Pipeline
55
=================================================
66

77
.. default-domain:: mongodb
@@ -12,15 +12,16 @@ Pause Data Extraction for Your {+dl+} Pipeline
1212
:depth: 1
1313
:class: singlecol
1414

15-
You can pause and resume extraction of snapshot data from the |service|
16-
cluster to {+adl+} datasets.
15+
You can pause and resume ingestion of snapshot data from the |service|
16+
cluster to {+adl+} datasets. You can't pause on-demand ingestion of
17+
snapshot data.
1718

18-
Pause Data Extraction for Your {+dl+} Pipeline
19+
Pause Data Ingestion for Your {+dl+} Pipeline
1920
-------------------------------------------------
2021

2122
When you pause your {+dl+} pipeline, |service| doesn't ingest new
22-
datasets. You can continue to query previous snapshots that have been
23-
ingested.
23+
datasets. You can continue to query previous snapshots from which data
24+
has been ingested.
2425

2526
To pause a pipeline:
2627

@@ -34,22 +35,22 @@ To pause a pipeline:
3435
#. Select :guilabel:`Data Lake` under :guilabel:`Deployment`
3536
on the left-hand navigation.
3637

37-
.. step:: Click :icon-fa4:`ellipsis-v` in the :guilabel:`Actions` column for the pipeline that you wish to pause and select :guilabel:`Pause Ingestion`.
38+
.. step:: Click ``||`` for the pipeline that you wish to pause.
3839

3940
.. step:: Click :guilabel:`Confirm` in the :guilabel:`Pause Ingestion` confirmation window.
4041

4142
When you pause your {+dl+} pipeline, the :guilabel:`Last Updated`
4243
column for the pipeline in the |service| UI shows the status
4344
for the pipeline as :guilabel:`Paused`.
4445

45-
Resume Data Extraction for Your {+dl+} Pipeline
46-
--------------------------------------------------
46+
Resume Data Ingestion for Your {+dl+} Pipeline
47+
-------------------------------------------------
4748

48-
When you resume data extraction for a paused {+adl+} pipeline,
49+
When you resume data ingestion for a paused {+adl+} pipeline,
4950
|service| begins to take snapshots, which are then ingested in to your
5051
{+dl+} datasets.
5152

52-
To resume data extraction:
53+
To resume data ingestion:
5354

5455
.. procedure::
5556

@@ -65,8 +66,8 @@ To resume data extraction:
6566

6667
.. step:: Click :guilabel:`Confirm` in the :guilabel:`Resume Ingestion` confirmation window.
6768

68-
When you resume data extraction for a paused {+adl+} pipeline,
69-
the :guilabel:`Last Updated` column for the pipeline in the
70-
|service| UI shows the date and time when data extraction for
69+
When you resume data ingestion for a paused {+adl+} pipeline,
70+
the :guilabel:`Last Run Time` column for the pipeline in the
71+
|service| UI shows the date and time when data ingestion for
7172
the pipeline resumed.
7273

source/administration/view-datalake-pipelines.txt

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -42,33 +42,38 @@ Procedure
4242
* - Data Size
4343
- Size of data for each dataset.
4444

45-
* - :guilabel:`Last Updated`
46-
- Date and time when the pipeline ran to extract data for
45+
* - :guilabel:`Last Run Time`
46+
- Date and time when the pipeline ran to ingest data for
4747
each dataset.
4848

4949
* - :guilabel:`Status`
5050
- Status of the pipeline. Value can be one of the following
5151
for a pipeline:
5252

5353
- ``Active`` - indicates that the pipeline is active
54-
- ``Paused`` - indicates that data extraction for the
54+
- ``Paused`` - indicates that data ingestion for the
5555
pipeline is paused
5656

5757
* - :guilabel:`Frequency`
58-
- Frequency at which cluster data is extracted and stored
58+
- Frequency at which cluster data is ingested and stored
5959
for querying.
6060

6161
* - :guilabel:`Actions`
6262
- Actions you can take for each pipeline. You can click
6363
one of the following:
6464

65-
- :icon:`trash` to delete a pipeline. You can't undo this
66-
action. If you delete a pipeline, {+adl+} deletes the
67-
datasets, including the data, and removes the datasets
68-
from the {+fdi+}\s where they are referenced. If you
69-
delete a dataset inside a pipeline, {+adl+} removes the
70-
dataset from the {+fdi+} storage configuration where the
71-
dataset is referenced.
72-
- :icon:`pencil` to edit the data extraction schedule for
65+
- :icon-fa4:`paused` to pause data ingestion and
66+
:icon-fa5:`arrow-right` to resume data ingestion. You
67+
can't pause on-demand ingestion of data.
68+
- :icon:`pencil` to edit the data ingestion schedule for
7369
the pipeline.
74-
- :icon-fa4:`fa4-ellipsis-v` to pause data extraction.
70+
- :icon-fa4:`fa4-ellipsis-v` to do the following:
71+
72+
- Delete a pipeline. You can't undo this action. If you
73+
delete a pipeline, {+adl+} deletes the datasets,
74+
including the data, and removes the datasets from the
75+
{+fdi+}\s where they are referenced. If you delete a
76+
dataset inside a pipeline, {+adl+} removes the dataset
77+
from the {+fdi+} storage configuration where the
78+
dataset is referenced.
79+
- Trigger an on-demand pipeline run.

source/index.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Supported Types of Data Source
3636
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3737

3838
{+adl+} supports collection snapshots from |service| clusters as a data
39-
source for extracted data. {+adl+} automatically extracts data from the
39+
source for extracted data. {+adl+} automatically ingests data from the
4040
snapshots, and partitions and stores data in an analytics-optimized
4141
format.
4242

@@ -61,16 +61,16 @@ You can use {+adl+} to:
6161
.. include:: /includes/list-table-supported-aws-regions.rst
6262

6363
{+adl+} automatically selects the region closest to your |service|
64-
cluster for storing extracted data.
64+
cluster for storing ingested data.
6565

6666
Billing
6767
-------
6868

6969
You incur {+adl+} charges per GB per month based on the |aws| region
70-
where the extracted data is stored. You incur {+adl+} costs for the
70+
where the ingested data is stored. You incur {+adl+} costs for the
7171
following items:
7272

73-
- Extraction of data from your data source
73+
- Ingestion of data from your data source
7474
- Storage on the cloud object storage
7575

7676
Extraction Costs

source/manage-adl-dataset-pipeline.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ Manage {+adl+} Pipeline
1414

1515
You can perform the following actions on your {+dl+} pipeline:
1616

17-
- Pause and resume extraction of snapshot data from the |service|
17+
- View and edit your {+dl+} pipelines.
18+
- Manually trigger ingestion of data from your snapshot.
19+
- Pause and resume ingestion of snapshot data from the |service|
1820
cluster to {+adl+} datasets. To learn more, see
1921
:ref:`pause-resume-data-ingestion`.
2022
- Delete your {+dl+} pipeline at any time. To learn more, see
@@ -25,5 +27,7 @@ You can perform the following actions on your {+dl+} pipeline:
2527
:hidden:
2628

2729
View Data Lake Pipelines </administration/view-datalake-pipelines>
28-
Pause and Resume Data Extraction </administration/pause-resume-data-extraction>
30+
Edit a Data Lake Pipeline </administration/edit-data-pipeline>
31+
Ingest Data On Demand </administration/ingest-data-on-demand>
32+
Pause and Resume Data Ingestion </administration/pause-resume-data-extraction>
2933
Delete Data Lake Pipeline </administration/delete-datalake-pipeline>

source/tutorial/add-dataset-pipeline.txt

Lines changed: 56 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Procedure
3535
---------
3636

3737
.. procedure::
38+
:style: normal
3839

3940
.. step:: Navigate to {+adl+} in the |service| UI.
4041

@@ -84,34 +85,67 @@ Procedure
8485

8586
#. Click :guilabel:`Continue`.
8687

87-
.. step:: Specify an extraction schedule for your cluster data.
88+
.. step:: Specify an ingestion schedule for your cluster data.
8889

89-
You can specify how frequently your cluster data is extracted for
90-
querying. Each snapshot represents your data at that point in
90+
You can specify how frequently your cluster data is extracted
91+
from your |service| Backup Snapshots and ingested into {+dl+}
92+
Datasets. Each snapshot represents your data at that point in
9193
time, which is stored in a workload isolated, analytic storage.
9294
You can query any snapshot data in the {+dl+} datasets.
9395

94-
You must choose from the following schedules the
95-
:guilabel:`Snapshot Schedule` that is similar to your backup
96-
schedule:
96+
You can choose :guilabel:`Basic Schedule` or :guilabel:`On
97+
Demand`.
9798

98-
- Every day
99-
- Every Saturday
100-
- Last day of the month
99+
.. tabs::
101100

102-
For example, if you select ``Every day``, you must have a
103-
``Daily`` backup schedule configured in your policy. Or, if you
104-
want to select a schedule of once a week, you must have a
105-
``Weekly`` backup schedule configured in your policy. To learn
106-
more, see :atlas:`Backup Scheduling </backup/cloud-backup/overview/#backup-scheduling--retention--and-on-demand-backup-snapshots>`.
101+
.. tab:: Basic Schedule
102+
:tabid: basic
107103

108-
.. example::
104+
:guilabel:`Basic Schedule` lets you define the frequency
105+
for automatically ingesting data from available snapshots.
106+
You must choose from the following schedules. Choose the
107+
:guilabel:`Snapshot Schedule` that is similar to your
108+
backup schedule:
109+
110+
- Every day
111+
- Every Saturday
112+
- Last day of the month
113+
114+
For example, if you select ``Every day``, you must have a
115+
``Daily`` backup schedule configured in your policy. Or, if
116+
you want to select a schedule of once a week, you must have
117+
a ``Weekly`` backup schedule configured in your policy. To
118+
learn more, see :atlas:`Backup Scheduling </backup/cloud-backup/overview/#backup-scheduling--retention--and-on-demand-backup-snapshots>`.
119+
120+
.. example::
121+
122+
For this tutorial, select :guilabel:`Daily` from the
123+
:guilabel:`Snapshot Schedule` dropdown if you don't have
124+
a backup schedule yet. If you have a backup schedule,
125+
the available options are based on the schedule you have
126+
set for your backup schedule.
127+
128+
.. tab:: On Demand
129+
:tabid: ondemand
130+
131+
:guilabel:`On Demand` lets you manually trigger ingestion
132+
of data from available snapshots whenever you want.
133+
134+
.. example::
109135

110-
For this tutorial, select :guilabel:`Daily` from the
111-
:guilabel:`Snapshot Schedule` dropdown if you don't have a
112-
backup schedule yet. If you have a backup schedule, the
113-
available options are based on the schedule you have set for
114-
your backup schedule.
136+
For this tutorial, if you select :guilabel:`On Demand`,
137+
you must manually trigger the ingestion of data from
138+
the snapshot after creating the pipeline. To learn more,
139+
see :ref:`ingest-on-demand`.
140+
141+
.. step:: Select the |aws| region for storing your extracted data.
142+
143+
{+adl+} provides optimized storage in the following |aws| regions:
144+
145+
.. include:: /includes/list-table-supported-aws-regions.rst
146+
147+
By default, {+adl+} automatically selects the region closest to
148+
your |service| cluster for storing extracted data.
115149

116150
.. step:: Specify fields in your collection to create partitions.
117151

@@ -141,11 +175,10 @@ Procedure
141175
{+dl+} dataset, {+adf+} optimizes performance for queries on
142176
the following fields:
143177

144-
- the ``year`` field,
145-
- the ``title`` field, and
178+
- the ``year`` field, and
146179
- the ``year`` field and the ``title`` field.
147180

148-
{+adf+} can also supports a query on the ``title`` field only.
181+
{+adf+} can also support a query on the ``title`` field only.
149182
However, in this case, {+adf+} wouldn't be as efficient in
150183
supporting the query as it would be if the query were on the
151184
``title`` field only. Performance is optimized in order; if a

source/tutorial/adl-run-sample-queries.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ examples shown in the procedures:
2424

2525
- :ref:`adl-add-pipeline` for the ``sample_mflix.movies``
2626
collection
27+
- (For :guilabel:`On Demand` schedule only) Manually trigger
28+
:ref:`Ingestion of data <ingest-on-demand>` from your snapshot
2729
- :ref:`adl-create-federated-db` for the {+dl+} dataset that is a
2830
snapshot of data in the ``sample_mflix.movies`` collection
2931
- :ref:`adl-connect-federated-db-instance` to run the queries

0 commit comments

Comments
 (0)