Improvements to snapshot read documentation (#395)

jeff-allen-mongo · jeff-allen-mongo · commit 0618694c89c6 · 2022-01-24T20:10:43.000-05:00
* (DOCSP-20511-DOCSP-20512): Updates for snapshot reads * Add long-running queries page * WIP * Add section on snapshot window configuration * rename file * word tweak * shorten query output * Change ToC depth * Word tweak and formatting fix * Apply suggestions from code review * Updates per review * Word tweak
diff --git a/source/reference/read-concern-snapshot.txt b/source/reference/read-concern-snapshot.txt
@@ -11,6 +11,12 @@ Read Concern ``"snapshot"``
 .. meta::
    :description: read concern, snapshot read concern, read isolation, transactions, multi-document transactions
    :keywords: read concern, snapshot read concern, read isolation, transactions, multi-document transactions
+   
+.. contents:: On this page
+   :local:
+   :backlinks: none
+   :depth: 1
+   :class: singlecol
 
 .. versionadded:: 4.0
 .. versionchanged:: 5.0
@@ -33,7 +39,7 @@ read operations outside of multi-document transactions.
 
 Outside of multi-document transactions, read concern
 :readconcern:`"snapshot"` is available on primaries and secondaries for
-the following read operations :
+the following read operations:
 
 - :dbcommand:`find`
 - :dbcommand:`aggregate`
@@ -67,7 +73,7 @@ Read Concern and ``atClusterTime``
 
 .. versionadded:: 5.0
 
-Outside of multi-document transaction, reads with read concern
+Outside of multi-document transactions, reads with read concern
 :readconcern:`"snapshot"` support the optional parameter
 ``atClusterTime``. The parameter ``atClusterTime`` allows you to specify
 the timestamp for the read. To satisfy a read request with a specified
@@ -103,15 +109,25 @@ the client.
 Outside of transactions, :readconcern:`"snapshot"` reads are
 guaranteed to read from majority-committed data.
 
-.. note:
-
-   - If you perform a read operation with :readconcern:`"snapshot"`
-     against a delayed replica set member, the returned majority-committed
-     data could be stale.
-
-   - It is not possible to specify ``atClusterTime`` for
-     :readconcern:`"snapshot"` inside of :ref:`causally consistent
-     sessions <sessions>`.
+``atClusterTime`` Considerations and Behavior
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- The allowed values for ``atClusterTime`` depend on the
+  :parameter:`minSnapshotHistoryWindowInSeconds` parameter.
+  ``minSnapshotHistoryWindowInSeconds`` is the minimum time window in
+  seconds for which the storage engine keeps the snapshot history. If
+  you specify an :ref:`atClusterTime <atClusterTime>` value older than
+  the oldest snapshot retained according to
+  ``minSnapshotHistoryWindowInSeconds``, :binary:`~bin.mongod` returns
+  an error.
+
+- If you perform a read operation with :readconcern:`"snapshot"`
+  against a delayed replica set member, the returned majority-committed
+  data could be stale.
+
+- It is not possible to specify ``atClusterTime`` for
+  :readconcern:`"snapshot"` inside of :ref:`causally consistent
+  sessions <sessions>`.
 
 Read Concern on Capped Collections
 ----------------------------------
diff --git a/source/tutorial/long-running-queries.txt b/source/tutorial/long-running-queries.txt
@@ -0,0 +1,173 @@
+.. _tutorial-long-running-queries:
+
+============================
+Perform Long-Running Queries
+============================
+
+.. default-domain:: mongodb
+
+.. contents:: On this page
+   :local:
+   :backlinks: none
+   :depth: 1
+   :class: singlecol
+
+When MongoDB performs long-running queries using the default read
+concern of :readconcern:`"local"`, the snapshot time from which data is
+returned is subject to change. The snapshot time may move forward as the
+query runs, and the query may return unexpected or inconsistent results.
+
+To guarantee the snapshot time and work from a single point in time
+across a cluster, use read concern :readconcern:`"snapshot"`. Read
+concern :readconcern:`"snapshot"` allows you to specify the
+:ref:`atClusterTime <atClusterTime>` parameter, indicating the exact
+point in time from which to return your data.
+
+Example
+-------
+
+Create a ``movies`` collection with these documents:
+
+.. code-block:: javascript
+
+   db.movies.insertMany( [
+      {
+         title: 'Titanic',
+         year: 1997,
+         genres: [ 'Drama', 'Romance' ],
+         rated: 'PG-13',
+         languages: [ 'English', 'French', 'German', 'Swedish', 'Italian', 'Russian' ],
+         released: ISODate("1997-12-19T00:00:00.000Z"),
+         cast: [ 'Leonardo DiCaprio', 'Kate Winslet', 'Billy Zane', 'Kathy Bates' ],
+         directors: [ 'James Cameron' ]
+      },
+      {
+         title: 'The Dark Knight',
+         year: 2008,
+         genres: [ 'Action', 'Crime', 'Drama' ],
+         rated: 'PG-13',
+         languages: [ 'English', 'Mandarin' ],
+         released: ISODate("2008-07-18T00:00:00.000Z"),
+         cast: [ 'Christian Bale', 'Heath Ledger', 'Aaron Eckhart', 'Michael Caine' ],
+         directors: [ 'Christopher Nolan' ]
+      },
+      {
+         title: 'Spirited Away',
+         year: 2001,
+         genres: [ 'Animation', 'Adventure', 'Family' ],
+         rated: 'PG',
+         languages: [ 'Japanese' ],
+         released: ISODate("2003-03-28T00:00:00.000Z"),
+         cast: [ 'Rumi Hiiragi', 'Miyu Irino', 'Mari Natsuki', 'Takashi Naitè' ],
+         directors: [ 'Hayao Miyazaki' ]
+      },
+      {
+         title: 'Casablanca',
+         genres: [ 'Drama', 'Romance', 'War' ],
+         rated: 'PG',
+         cast: [ 'Humphrey Bogart', 'Ingrid Bergman', 'Paul Henreid', 'Claude Rains' ],
+         languages: [ 'English', 'French', 'German', 'Italian' ],
+         released: ISODate("1943-01-23T00:00:00.000Z"),
+         directors: [ 'Michael Curtiz' ],
+         lastupdated: '2015-09-04 00:22:54.600000000',
+         year: 1942
+      }
+   ] )
+
+The following command: 
+
+- Performs a :dbcommand:`find` operation on the ``movies`` collection
+  with read concern :readconcern:`"snapshot"`.
+  
+- Specifies a filter to only return documents where the ``directors`` field
+  contains ``Christopher Nolan``.
+  
+- Specifies that the operation should read data from the snapshot at
+  cluster time ``Timestamp(1643066050, 1)``.
+
+.. important::
+
+   To run this example, you must set the :ref:`atClusterTime
+   <atClusterTime>` value to a timestamp that occurs after the time at
+   which you inserted the data.
+
+.. code-block:: javascript
+
+   db.runCommand( {
+       find: "movies",
+       filter: { directors: "Christopher Nolan" },
+       readConcern: {
+           level: "snapshot",
+           atClusterTime: Timestamp(1643066050, 1)
+       },
+   } )
+
+Example output:
+
+.. code-block:: javascript
+
+   {
+     cursor: {
+       firstBatch: [
+         {
+           _id: ObjectId("61ef27ac707c25130e7611f8"),
+           title: 'The Dark Knight',
+           year: 2008,
+           genres: [ 'Action', 'Crime', 'Drama' ],
+           rated: 'PG-13',
+           languages: [ 'English', 'Mandarin' ],
+           released: ISODate("2008-07-18T00:00:00.000Z"),
+           cast: [
+             'Christian Bale',
+             'Heath Ledger',
+             'Aaron Eckhart',
+             'Michael Caine'
+           ],
+           directors: [ 'Christopher Nolan' ]
+         }
+       ],
+       id: Long("0"),
+       ns: 'test.movies',
+       atClusterTime: Timestamp({ t: 1643066050, i: 1 })
+     },
+     ok: 1,
+     '$clusterTime': {
+       clusterTime: Timestamp({ t: 1643066060, i: 1 }),
+       signature: {
+         hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
+         keyId: Long("0")
+       }
+     },
+     operationTime: Timestamp({ t: 1643066060, i: 1 })
+   }
+
+Consideration
+~~~~~~~~~~~~~
+
+Due to the small size of the data set used in this example, there is not
+a large benefit to using read concern :readconcern:`"snapshot"`. The
+query should run fast enough such that all data is returned at the
+same cluster time without needing to specify read concern
+:readconcern:`"snapshot"`.
+
+The purpose of the preceding example is to demonstrate the syntax and expected
+output for read concern :readconcern:`"snapshot"`.
+
+Configure Snapshot Retention
+----------------------------
+
+By default, the WiredTiger storage engine retains snapshot history for
+300 seconds. This means that you can specify an :ref:`atClusterTime
+<atClusterTime>` value for your query that is up to 300 seconds before
+the time at which you run the query.
+
+If your query runs for longer than 300 seconds, you may wish to increase
+the snapshot retention period. To specify how long WiredTiger keeps the
+snapshot history, modify the
+:parameter:`minSnapshotHistoryWindowInSeconds` parameter.
+
+Increasing the value of :parameter:`minSnapshotHistoryWindowInSeconds`
+increases disk usage because the server must maintain the history of
+older modified values within the specified time window. The amount of
+disk space used depends on your workload, with higher volume workloads
+requiring more disk space.
diff --git a/source/tutorial/query-documents.txt b/source/tutorial/query-documents.txt
@@ -596,4 +596,5 @@ level of isolation for their reads. For more information, see
    /tutorial/query-array-of-documents
    /tutorial/project-fields-from-query-results
    /tutorial/query-for-null-fields
+   /tutorial/long-running-queries
    /tutorial/iterate-a-cursor/