DOCSP-24408 doc for On Demand Snapshot Support (#269)

kanchana-mongodb · sarahsimpers · web-flow · commit c7e1ff094f38 · 2022-08-24T09:16:57.000-07:00
* DOCSP-24408 doc for On Demand Snapshot Support * DOCSP-24408 updates for feedback * DOCSP-24408 changes for using ingest instead of extract * Apply suggestions from code review CR feedback Co-authored-by: sarahsimpers <82042374+sarahsimpers@users.noreply.github.com> * DOCSP-24408 updates to doc * DOCSP-24408 resolve merge conflicts Co-authored-by: sarahsimpers <82042374+sarahsimpers@users.noreply.github.com>
diff --git a/source/administration/edit-data-pipeline.txt b/source/administration/edit-data-pipeline.txt
@@ -0,0 +1,68 @@
+.. _edit-dataset-pipeline:
+
+================================
+Edit an {+adl+} Pipeline  
+================================
+
+.. default-domain:: mongodb
+
+.. contents:: On this page
+   :local:
+   :backlinks: none
+   :depth: 1
+   :class: singlecol
+
+.. default-domain:: mongodb
+
+You can make changes to your {+dl+} pipelines through the |service| UI, including:
+
+- Edit the data extraction schedule
+- Edit the data storage region
+- Change the fields to exclude from your {+dl+} datasets
+
+
+Procedure 
+---------
+
+.. procedure:: 
+   :style: normal
+
+   .. step:: Log in to `MongoDB Atlas <https://cloud.mongodb.com>`__.
+
+   .. step::  Select :guilabel:`Data Lake` under :guilabel:`Deployment` on the left-hand navigation.
+
+   .. step:: Click :icon-fa5:`pencil-alt` in the :guilabel:`Actions` column for the pipeline that you wish to modify.
+
+   .. step:: (Optional) Make changes to your data extraction schedule. 
+
+      Before making changes to your :guilabel:`Basic Schedule`, ensure 
+      that your desired data extraction frequency is similar to your 
+      current backup schedule. For example, if you wish to switch to 
+      ``Daily``, you must have a ``Daily`` backup schedule configured 
+      in your policy. Or, if you want to switch to a schedule of once a 
+      week, you must have a ``Weekly`` backup schedule configured in 
+      your policy. To learn more, see :atlas:`Backup Scheduling 
+      </backup/cloud-backup/overview/#backup-scheduling--retention--and-on-demand-backup-snapshots>`.
+
+   .. step:: (Optional) Make changes to your data storage region.
+
+      {+adl+} provides optimized storage in the following |aws| regions:
+
+      .. include:: /includes/list-table-supported-aws-regions.rst
+
+   .. step:: Click :guilabel:`Continue`.
+
+   .. step:: (Optional) Make changes to the fields excluded from your {+dl+} datasets.
+
+      - Click :guilabel:`Add Field` and specify :guilabel:`Field Name` 
+        to add fields to the excluded fields list.
+
+      - Click :guilabel:`Delete All` to remove all the fields from the 
+        excluded fields list. 
+
+      - Click :icon:`trash-alt` next to a field to remove that  
+        field from the excluded fields list.
+
+   .. step:: Click :guilabel:`Review Changes` to review the changes to your pipeline.
+
+   .. step:: Click :guilabel:`Apply Changes` for the changes to take effect.
diff --git a/source/administration/ingest-data-on-demand.txt b/source/administration/ingest-data-on-demand.txt
@@ -0,0 +1,35 @@
+.. _ingest-on-demand:
+
+================================
+Trigger Data Ingestion On Demand 
+================================
+
+.. default-domain:: mongodb
+
+You can manually trigger an ingestion of snapshot data from the 
+|service| cluster to {+adl+} datasets if you configured :guilabel:`On 
+Demand` extraction in your {+dl+} pipeline. 
+
+Procedure 
+---------
+
+.. procedure:: 
+   :style: normal 
+
+   .. step:: Log in to `MongoDB Atlas <https://cloud.mongodb.com>`__.
+
+   .. step::  Select :guilabel:`Data Lake` under :guilabel:`Deployment` on the left-hand navigation.
+
+   .. step:: Click the vertical ellipsis (:icon-fa4:`ellipsis-v`) for the {+dl+} for which you configured :guilabel:`On Demand` ingestion and select :guilabel:`Trigger an On Demand Pipeline Run`.
+
+   .. step:: Select the snapshot, from which to ingest data, from the dropdown.
+
+      The dropdown shows a list of all the snapshots on your |service| 
+      cluster. However, you can select only the snapshots from which 
+      {+dl+} hasn't yet ingested data; the grayed-out snapshots are 
+      snapshots from which your {+dl+} has already ingested data.
+      
+   .. step:: Click :guilabel:`Confirm`.
+
+      |service| displays a blue banner at the top of the page that 
+      shows the data ingestion status. 
diff --git a/source/administration/pause-resume-data-extraction.txt b/source/administration/pause-resume-data-extraction.txt
@@ -1,7 +1,7 @@
 .. _pause-resume-data-ingestion:
 
 =================================================
-Pause Data Extraction for Your {+dl+} Pipeline 
+Pause Data Ingestion for Your {+dl+} Pipeline 
 =================================================
 
 .. default-domain:: mongodb
@@ -12,15 +12,16 @@ Pause Data Extraction for Your {+dl+} Pipeline
    :depth: 1
    :class: singlecol
 
-You can pause and resume extraction of snapshot data from the |service| 
-cluster to {+adl+} datasets. 
+You can pause and resume ingestion of snapshot data from the |service| 
+cluster to {+adl+} datasets. You can't pause on-demand ingestion of 
+snapshot data.
 
-Pause Data Extraction for Your {+dl+} Pipeline 
+Pause Data Ingestion for Your {+dl+} Pipeline 
 -------------------------------------------------
 
 When you pause your {+dl+} pipeline, |service| doesn't ingest new 
-datasets. You can continue to query previous snapshots that have been 
-ingested. 
+datasets. You can continue to query previous snapshots from which data 
+has been ingested. 
 
 To pause a pipeline:
 
@@ -34,22 +35,22 @@ To pause a pipeline:
       #. Select :guilabel:`Data Lake` under :guilabel:`Deployment` 
          on the left-hand navigation.
 
-   .. step:: Click :icon-fa4:`ellipsis-v` in the :guilabel:`Actions` column for the pipeline that you wish to pause and select :guilabel:`Pause Ingestion`.
+   .. step:: Click ``||`` for the pipeline that you wish to pause.
 
    .. step:: Click :guilabel:`Confirm` in the :guilabel:`Pause Ingestion` confirmation window.
 
       When you pause your {+dl+} pipeline, the :guilabel:`Last Updated` 
       column for the pipeline in the |service| UI shows the status 
       for the pipeline as :guilabel:`Paused`. 
 
-Resume Data Extraction for Your {+dl+} Pipeline
---------------------------------------------------
+Resume Data Ingestion for Your {+dl+} Pipeline
+-------------------------------------------------
 
-When you resume data extraction for a paused {+adl+} pipeline, 
+When you resume data ingestion for a paused {+adl+} pipeline, 
 |service| begins to take snapshots, which are then ingested in to your 
 {+dl+} datasets. 
 
-To resume data extraction: 
+To resume data ingestion: 
 
 .. procedure:: 
 
@@ -65,8 +66,8 @@ To resume data extraction:
 
    .. step:: Click :guilabel:`Confirm` in the :guilabel:`Resume Ingestion` confirmation window.
 
-      When you resume data extraction for a paused {+adl+} pipeline, 
-      the :guilabel:`Last Updated` column for the pipeline in the 
-      |service| UI shows the date and time when data extraction for 
+      When you resume data ingestion for a paused {+adl+} pipeline, 
+      the :guilabel:`Last Run Time` column for the pipeline in the 
+      |service| UI shows the date and time when data ingestion for 
       the pipeline resumed.
       
diff --git a/source/administration/view-datalake-pipelines.txt b/source/administration/view-datalake-pipelines.txt
@@ -42,33 +42,38 @@ Procedure
          * - Data Size
            - Size of data for each dataset.
            
-         * - :guilabel:`Last Updated`
-           - Date and time when the pipeline ran to extract data for 
+         * - :guilabel:`Last Run Time`
+           - Date and time when the pipeline ran to ingest data for 
              each dataset.
            
          * - :guilabel:`Status`
            - Status of the pipeline. Value can be one of the following 
              for a pipeline: 
 
              - ``Active`` - indicates that the pipeline is active
-             - ``Paused`` - indicates that data extraction for the 
+             - ``Paused`` - indicates that data ingestion for the 
                pipeline is paused
            
          * - :guilabel:`Frequency`
-           - Frequency at which cluster data is extracted and stored 
+           - Frequency at which cluster data is ingested and stored 
              for querying.
            
          * - :guilabel:`Actions`
            - Actions you can take for each pipeline. You can click 
              one of the following: 
 
-             - :icon:`trash` to delete a pipeline. You can't undo this 
-               action. If you delete a pipeline, {+adl+} deletes the 
-               datasets, including the data, and removes the datasets 
-               from the {+fdi+}\s where they are referenced. If you 
-               delete a dataset inside a pipeline, {+adl+} removes the 
-               dataset from the {+fdi+} storage configuration where the 
-               dataset is referenced.
-             - :icon:`pencil` to edit the data extraction schedule for 
+             - :icon-fa4:`paused` to pause data ingestion and 
+               :icon-fa5:`arrow-right` to resume data ingestion. You 
+               can't pause on-demand ingestion of data.
+             - :icon:`pencil` to edit the data ingestion schedule for 
                the pipeline. 
-             - :icon-fa4:`fa4-ellipsis-v` to pause data extraction.
+             - :icon-fa4:`fa4-ellipsis-v` to do the following:
+             
+               - Delete a pipeline. You can't undo this action. If you 
+                 delete a pipeline, {+adl+} deletes the datasets, 
+                 including the data, and removes the datasets from the 
+                 {+fdi+}\s where they are referenced. If you delete a 
+                 dataset inside a pipeline, {+adl+} removes the dataset 
+                 from the {+fdi+} storage configuration where the 
+                 dataset is referenced. 
+               - Trigger an on-demand pipeline run. 
diff --git a/source/index.txt b/source/index.txt
@@ -36,7 +36,7 @@ Supported Types of Data Source
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 {+adl+} supports collection snapshots from |service| clusters as a data 
-source for extracted data. {+adl+} automatically extracts data from the 
+source for extracted data. {+adl+} automatically ingests data from the 
 snapshots, and partitions and stores data in an analytics-optimized 
 format.
 
@@ -61,16 +61,16 @@ You can use {+adl+} to:
 .. include:: /includes/list-table-supported-aws-regions.rst
 
 {+adl+} automatically selects the region closest to your |service| 
-cluster for storing extracted data. 
+cluster for storing ingested data. 
 
 Billing 
 -------
 
 You incur {+adl+} charges per GB per month based on the |aws| region 
-where the extracted data is stored. You incur {+adl+} costs for the 
+where the ingested data is stored. You incur {+adl+} costs for the 
 following items:
 
-- Extraction of data from your data source
+- Ingestion of data from your data source
 - Storage on the cloud object storage
 
 Extraction Costs 
diff --git a/source/manage-adl-dataset-pipeline.txt b/source/manage-adl-dataset-pipeline.txt
@@ -14,7 +14,9 @@ Manage {+adl+} Pipeline
 
 You can perform the following actions on your {+dl+} pipeline: 
 
-- Pause and resume extraction of snapshot data from the |service| 
+- View and edit your {+dl+} pipelines.
+- Manually trigger ingestion of data from your snapshot.
+- Pause and resume ingestion of snapshot data from the |service| 
   cluster to {+adl+} datasets. To learn more, see 
   :ref:`pause-resume-data-ingestion`. 
 - Delete your {+dl+} pipeline at any time. To learn more, see 
@@ -25,5 +27,7 @@ You can perform the following actions on your {+dl+} pipeline:
    :hidden:
 
    View Data Lake Pipelines </administration/view-datalake-pipelines>
-   Pause and Resume Data Extraction </administration/pause-resume-data-extraction>
+   Edit a Data Lake Pipeline </administration/edit-data-pipeline>
+   Ingest Data On Demand </administration/ingest-data-on-demand>
+   Pause and Resume Data Ingestion </administration/pause-resume-data-extraction>
    Delete Data Lake Pipeline </administration/delete-datalake-pipeline>
diff --git a/source/tutorial/add-dataset-pipeline.txt b/source/tutorial/add-dataset-pipeline.txt
@@ -35,6 +35,7 @@ Procedure
 ---------
 
 .. procedure:: 
+   :style: normal
 
    .. step:: Navigate to {+adl+} in the |service| UI. 
 
@@ -84,34 +85,67 @@ Procedure
 
       #. Click :guilabel:`Continue`.
 
-   .. step:: Specify an extraction schedule for your cluster data.
+   .. step:: Specify an ingestion schedule for your cluster data.
 
-      You can specify how frequently your cluster data is extracted for 
-      querying. Each snapshot represents your data at that point in 
+      You can specify how frequently your cluster data is extracted 
+      from your |service| Backup Snapshots and ingested into {+dl+} 
+      Datasets. Each snapshot represents your data at that point in 
       time, which is stored in a workload isolated, analytic storage. 
       You can query any snapshot data in the {+dl+} datasets.
 
-      You must choose from the following schedules the 
-      :guilabel:`Snapshot Schedule` that is similar to your backup 
-      schedule:
+      You can choose :guilabel:`Basic Schedule` or :guilabel:`On 
+      Demand`.
 
-      - Every day 
-      - Every Saturday 
-      - Last day of the month
+      .. tabs:: 
 
-      For example, if you select ``Every day``, you must have a 
-      ``Daily`` backup schedule configured in your policy. Or, if you 
-      want to select a schedule of once a week, you must have a 
-      ``Weekly`` backup schedule configured in your policy. To learn 
-      more, see :atlas:`Backup Scheduling </backup/cloud-backup/overview/#backup-scheduling--retention--and-on-demand-backup-snapshots>`.
+         .. tab:: Basic Schedule 
+            :tabid: basic
 
-      .. example:: 
+            :guilabel:`Basic Schedule` lets you define the frequency 
+            for automatically ingesting data from available snapshots.
+            You must choose from the following schedules. Choose the 
+            :guilabel:`Snapshot Schedule` that is similar to your 
+            backup schedule:
+
+            - Every day 
+            - Every Saturday 
+            - Last day of the month
+
+            For example, if you select ``Every day``, you must have a 
+            ``Daily`` backup schedule configured in your policy. Or, if 
+            you want to select a schedule of once a week, you must have 
+            a ``Weekly`` backup schedule configured in your policy. To 
+            learn more, see :atlas:`Backup Scheduling </backup/cloud-backup/overview/#backup-scheduling--retention--and-on-demand-backup-snapshots>`.
+
+            .. example:: 
+         
+               For this tutorial, select :guilabel:`Daily` from the 
+               :guilabel:`Snapshot Schedule` dropdown if you don't have 
+               a backup schedule yet. If you have a backup schedule, 
+               the available options are based on the schedule you have 
+               set for your backup schedule.
+
+         .. tab:: On Demand 
+            :tabid: ondemand
+
+            :guilabel:`On Demand` lets you manually trigger ingestion  
+            of data from available snapshots whenever you want. 
+
+            .. example:: 
          
-         For this tutorial, select :guilabel:`Daily` from the 
-         :guilabel:`Snapshot Schedule` dropdown if you don't have a 
-         backup schedule yet. If you have a backup schedule, the 
-         available options are based on the schedule you have set for 
-         your backup schedule.
+               For this tutorial, if you select :guilabel:`On Demand`,  
+               you must manually trigger the ingestion of data from 
+               the snapshot after creating the pipeline. To learn more, 
+               see :ref:`ingest-on-demand`.
+
+   .. step:: Select the |aws| region for storing your extracted data.
+
+      {+adl+} provides optimized storage in the following |aws| regions:
+
+      .. include:: /includes/list-table-supported-aws-regions.rst
+
+      By default, {+adl+} automatically selects the region closest to 
+      your |service| cluster for storing extracted data. 
 
    .. step:: Specify fields in your collection to create partitions.
 
@@ -141,11 +175,10 @@ Procedure
          {+dl+} dataset, {+adf+} optimizes performance for queries on 
          the following fields:
 
-         - the ``year`` field,
-         - the ``title`` field, and 
+         - the ``year`` field, and 
          - the ``year`` field and the ``title`` field.
 
-         {+adf+} can also supports a query on the ``title`` field only. 
+         {+adf+} can also support a query on the ``title`` field only. 
          However, in this case, {+adf+} wouldn't be as efficient in 
          supporting the query as it would be if the query were on the 
          ``title`` field only. Performance is optimized in order; if a 
diff --git a/source/tutorial/adl-run-sample-queries.txt b/source/tutorial/adl-run-sample-queries.txt
@@ -24,6 +24,8 @@ examples shown in the procedures:
 
 - :ref:`adl-add-pipeline` for the ``sample_mflix.movies`` 
   collection
+- (For :guilabel:`On Demand` schedule only) Manually trigger 
+  :ref:`Ingestion of data <ingest-on-demand>` from your snapshot
 - :ref:`adl-create-federated-db` for the {+dl+} dataset that is a 
   snapshot of data in the ``sample_mflix.movies`` collection
 - :ref:`adl-connect-federated-db-instance` to run the queries