From 9e12c1a6a225407df3565c6785c47552aa8f2884 Mon Sep 17 00:00:00 2001
From: Naomi Pentrel <5212232+npentrel@users.noreply.github.com>
Date: Wed, 24 Mar 2021 12:34:22 +0100
Subject: [PATCH] DOCSP-15254 clarify agg pipeline 100mb limit

---
 source/core/aggregation-pipeline-limits.txt   | 19 ++++----
 source/includes/fact-agg-memory-limit.rst     | 45 +++++++++++++------
 source/reference/command/aggregate.txt        | 14 +++---
 .../method/db.collection.aggregate.txt        | 29 +++++++-----
 .../reference/operator/aggregation/bucket.txt | 18 ++++++++
 .../operator/aggregation/bucketAuto.txt       | 17 +++++++
 .../reference/operator/aggregation/group.txt  | 17 +++----
 .../reference/operator/aggregation/sort.txt   | 29 +++++++-----
 .../operator/aggregation/sortByCount.txt      | 20 ++++++++-
 9 files changed, 145 insertions(+), 63 deletions(-)

diff --git a/source/core/aggregation-pipeline-limits.txt b/source/core/aggregation-pipeline-limits.txt
index 38d22b64257..258a560b01c 100644
--- a/source/core/aggregation-pipeline-limits.txt
+++ b/source/core/aggregation-pipeline-limits.txt
@@ -21,16 +21,15 @@ Result Size Restrictions
    MongoDB 3.6 removes the option for the :dbcommand:`aggregate`
    command to return its results as a single document.
 
-The :dbcommand:`aggregate` command can return
-either a cursor or store the results in a collection. When returning a
-cursor or storing the results in a collection, each document in the
-result set is subject to the :limit:`BSON Document Size` limit,
-currently 16 megabytes; if any single document that exceeds the
-:limit:`BSON Document Size` limit, the command will produce an error.
-The limit only applies to the returned documents; during the pipeline
-processing, the documents may exceed this size. The
-:method:`db.collection.aggregate()` method returns a cursor by default.
-
+The :dbcommand:`aggregate` command can either return a cursor or store
+the results in a collection. When returning a cursor or storing the
+results in a collection, each document in the result set is subject to
+the :limit:`BSON Document Size` limit, currently 16 megabytes; if any
+single document exceeds the :limit:`BSON Document Size` limit, the
+command produces an error. The limit only applies to the returned
+documents; during the pipeline processing, the documents may exceed this
+size. The :method:`db.collection.aggregate()` method returns a cursor by
+default.
 
 .. _agg-memory-restrictions:
 
diff --git a/source/includes/fact-agg-memory-limit.rst b/source/includes/fact-agg-memory-limit.rst
index 9908db74e55..53419552387 100644
--- a/source/includes/fact-agg-memory-limit.rst
+++ b/source/includes/fact-agg-memory-limit.rst
@@ -1,23 +1,40 @@
-.. For any pipeline stage that has a memory limit, the operation
-   will produce an error if exceeds its memory limit. Currently, only
-   $sort and $group have a limit.
-
 .. FYI -- 2.5.3 introduced the limit to $group and changed the limit for
    $sort from 10% to 100 MB.
 
-Pipeline stages have a limit of 100 megabytes of RAM. If a stage
-exceeds this limit, MongoDB will produce an error. To allow for the
-handling of large datasets, use the ``allowDiskUse`` option to enable
-aggregation pipeline stages to write data to temporary files.
+Each individual pipeline stage has a limit of 100 megabytes of RAM. By
+default, if a stage exceeds this limit, MongoDB produces an error. For
+some pipeline stages you can allow pipeline processing to take up more
+space by using the :ref:`allowDiskUse <aggregate-cmd-allowDiskUse>`
+option to enable aggregation pipeline stages to write data to temporary
+files.
 
-.. versionchanged:: 3.4
+Examples of stages that can spill to disk when :ref:`allowDiskUse
+<aggregate-cmd-allowDiskUse>` is ``true`` are:
 
-.. include:: /includes/fact-graphlookup-memory-restrictions.rst
+- :pipeline:`$bucket`
+- :pipeline:`$bucketAuto`
+- :pipeline:`$group`
+- :pipeline:`$sort` when the sort operation is not supported by an
+  index
+- :pipeline:`$sortByCount`
 
-.. include:: /includes/extracts/4.2-changes-usedDisk.rst
+.. note::
+
+   Pipeline stages operate on streams of documents with each pipeline
+   stage taking in documents, processing them, and then outputing the
+   resulting documents.
 
-.. seealso::
+   Some stages can't output any documents until they have processed all
+   incoming documents. These pipeline stages must keep their stage
+   output in RAM until all incoming documents are processed. As a
+   result, these pipeline stages may require more space than the 100 MB
+   limit.
 
-   - :ref:`sort-memory-limit`
-   - :ref:`group-memory-limit`
+If the results of one of your :pipeline:`$sort` pipeline stages exceed
+the limit, consider :ref:`adding a $limit stage <sort-limit-sequence>`.
 
+.. versionchanged:: 3.4
+
+   .. include:: /includes/fact-graphlookup-memory-restrictions.rst
+
+.. include:: /includes/extracts/4.2-changes-usedDisk.rst
diff --git a/source/reference/command/aggregate.txt b/source/reference/command/aggregate.txt
index 3cd8b7846b9..22009b7173d 100644
--- a/source/reference/command/aggregate.txt
+++ b/source/reference/command/aggregate.txt
@@ -407,15 +407,16 @@ to ``true`` to return information about the aggregation operation.
 Aggregate Data using External Sort
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Aggregation pipeline stages have :ref:`maximum memory use limit
-<agg-memory-restrictions>`. To handle large datasets, set
-``allowDiskUse`` option to ``true`` to enable writing data to
-temporary files, as in the following example:
+Each individual pipeline stage has :ref:`a limit of 100 megabytes of RAM
+<agg-memory-restrictions>`. By default, if a stage exceeds this limit,
+MongoDB produces an error. To allow pipeline processing to take up
+more space, set the :ref:`allowDiskUse <aggregate-cmd-allowDiskUse>`
+option to ``true`` to enable writing data to temporary files, as in the
+following example:
 
 .. code-block:: javascript
 
    db.stocks.aggregate( [
-         { $project : { cusip: 1, date: 1, price: 1, _id: 0 } },
          { $sort : { cusip : 1, date: 1 } }
       ],
       { allowDiskUse: true }
@@ -425,7 +426,8 @@ temporary files, as in the following example:
 
 .. seealso::
 
-   :method:`db.collection.aggregate()`
+   - :method:`db.collection.aggregate()`
+   - :doc:`/core/aggregation-pipeline-limits`
 
 
 Aggregate Data Specifying Batch Size
diff --git a/source/reference/method/db.collection.aggregate.txt b/source/reference/method/db.collection.aggregate.txt
index 186f9a14b74..fee3f1df269 100644
--- a/source/reference/method/db.collection.aggregate.txt
+++ b/source/reference/method/db.collection.aggregate.txt
@@ -395,25 +395,30 @@ You can view more verbose explain output by passing the
 Perform Large Sort Operation with External Sort
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Aggregation pipeline stages have :ref:`maximum memory use limit
-<agg-memory-restrictions>`. To handle large datasets, set
-``allowDiskUse`` option to ``true`` to enable writing data to
-temporary files, as in the following example:
+Each individual pipeline stage has :ref:`a limit of 100 megabytes of RAM
+<agg-memory-restrictions>`. By default, if a stage exceeds this limit,
+MongoDB produces an error. To allow pipeline processing to take up
+more space, set the :ref:`allowDiskUse <aggregate-cmd-allowDiskUse>`
+option to ``true`` to enable writing data to temporary files, as in the
+following example:
 
 .. code-block:: javascript
 
    var results = db.stocks.aggregate(
-                                      [
-                                        { $project : { cusip: 1, date: 1, price: 1, _id: 0 } },
-                                        { $sort : { cusip : 1, date: 1 } }
-                                      ],
-                                      {
-                                        allowDiskUse: true
-                                      }
-                                    )
+     [
+       { $sort : { cusip : 1, date: 1 } }
+     ],
+     {
+       allowDiskUse: true
+     }
+   )
 
 .. include:: /includes/extracts/4.2-changes-usedDisk.rst
 
+.. seealso::
+
+   :doc:`/core/aggregation-pipeline-limits`
+
 .. _example-aggregate-method-initial-batch-size:
 
 Specify an Initial Batch Size
diff --git a/source/reference/operator/aggregation/bucket.txt b/source/reference/operator/aggregation/bucket.txt
index 08e0f4ec468..473039e8442 100644
--- a/source/reference/operator/aggregation/bucket.txt
+++ b/source/reference/operator/aggregation/bucket.txt
@@ -27,6 +27,24 @@ Definition
    :pipeline:`$bucket` only produces output documents for buckets that
    contain at least one input document.
 
+Considerations
+--------------
+
+.. _bucket-memory-limit:
+
+``$bucket`` and Memory Restrictions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The :pipeline:`$bucket` stage has a limit of 100 megabytes of RAM. By
+default, if the stage exceeds this limit, :pipeline:`$bucket` returns an
+error. To allow more space for stage processing, use the
+:ref:`allowDiskUse <aggregate-cmd-allowDiskUse>` option to enable
+aggregation pipeline stages to write data to temporary files.
+
+.. seealso::
+
+   :doc:`/core/aggregation-pipeline-limits`
+
 Syntax
 ------
 
diff --git a/source/reference/operator/aggregation/bucketAuto.txt b/source/reference/operator/aggregation/bucketAuto.txt
index e13c9c92d72..2b4d99b6818 100644
--- a/source/reference/operator/aggregation/bucketAuto.txt
+++ b/source/reference/operator/aggregation/bucketAuto.txt
@@ -160,6 +160,23 @@ Definition
           
    
 
+Considerations
+--------------
+
+.. _bucketauto-memory-limit:
+
+``$bucketAuto`` and Memory Restrictions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The :pipeline:`$bucketAuto` stage has a limit of 100 megabytes of RAM.
+By default, if the stage exceeds this limit, :pipeline:`$bucketAuto`
+returns an error. To allow more space for stage processing, use the
+use the :ref:`allowDiskUse <aggregate-cmd-allowDiskUse>` option to
+enable aggregation pipeline stages to write data to temporary files.
+
+.. seealso::
+
+   :doc:`/core/aggregation-pipeline-limits`
 
 Behavior
 --------
diff --git a/source/reference/operator/aggregation/group.txt b/source/reference/operator/aggregation/group.txt
index fd1f79b54cb..e8bffda70de 100644
--- a/source/reference/operator/aggregation/group.txt
+++ b/source/reference/operator/aggregation/group.txt
@@ -74,17 +74,18 @@ operators:
 
 .. _group-memory-limit:
 
-``$group`` Operator and Memory
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+``$group`` and Memory Restrictions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The :pipeline:`$group` stage has a limit of 100 megabytes of RAM. By
 default, if the stage exceeds this limit, :pipeline:`$group` returns an
-error. To allow for the handling of large datasets, set the
-:method:`allowDiskUse <db.collection.aggregate()>` option to
-``true``. This flag enables :pipeline:`$group` operations to write to
-temporary files. For more information, see the
-:method:`db.collection.aggregate()` method and the
-:dbcommand:`aggregate` command.
+error. To allow more space for stage processing, use the
+:ref:`allowDiskUse <aggregate-cmd-allowDiskUse>` option to enable
+aggregation pipeline stages to write data to temporary files.
+
+.. seealso::
+
+   :doc:`/core/aggregation-pipeline-limits`
 
 .. _group-pipeline-optimization:
 
diff --git a/source/reference/operator/aggregation/sort.txt b/source/reference/operator/aggregation/sort.txt
index ed6c63124d9..2ee33bb4140 100644
--- a/source/reference/operator/aggregation/sort.txt
+++ b/source/reference/operator/aggregation/sort.txt
@@ -202,23 +202,28 @@ documents. See :expression:`$meta` for more information.
 ``$sort`` and Memory Restrictions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The :pipeline:`$sort` stage has a limit of 100 megabytes of RAM. By
-default, if the stage exceeds this limit, :pipeline:`$sort` will
-produce an error. To allow for the handling of large datasets, set the
-``allowDiskUse`` option to ``true`` to enable :pipeline:`$sort`
-operations to write to temporary files. See the ``allowDiskUse``
-option in :method:`db.collection.aggregate()` method and the
-:dbcommand:`aggregate` command for details.
+The :pipeline:`$sort` stage has a limit of 100 megabytes of RAM for
+in-memory sorts. By default, if the stage exceeds this limit,
+:pipeline:`$sort` produces an error. To allow pipeline processing to
+take up more space, use the :ref:`allowDiskUse
+<aggregate-cmd-allowDiskUse>` option to enable aggregation pipeline
+stages to write data to temporary files.
+
+.. seealso::
+
+   :doc:`/core/aggregation-pipeline-limits`
 
 ``$sort`` Operator and Performance
 ----------------------------------
 
-:pipeline:`$sort` operator can take advantage of an index as long as it
-is not preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or
-:pipeline:`$group` stage.
+The :pipeline:`$sort` operator can take advantage of an index if it's
+used in the first stage of a pipeline or if it's only preceeded by a
+:pipeline:`$match` stage.
 
-.. todo:: if a sort precedes the first $group in a sharded system,
-   all documents must go to the mongos for sorting.
+When you use the :pipeline:`$sort` on a sharded cluster, each shard
+sorts its result documents using an index where available. Then the
+:binary:`~bin.mongos` or one of the shards performs a streamed merge
+sort.
 
 .. seealso::
 
diff --git a/source/reference/operator/aggregation/sortByCount.txt b/source/reference/operator/aggregation/sortByCount.txt
index 51de2a661a5..5717cbe555e 100644
--- a/source/reference/operator/aggregation/sortByCount.txt
+++ b/source/reference/operator/aggregation/sortByCount.txt
@@ -78,9 +78,27 @@ Definition
 
       :doc:`/reference/bson-type-comparison-order/`
 
+Considerations
+--------------
+
+.. _sortbycount-memory-limit:
+
+``$count`` and Memory Restrictions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The :pipeline:`$sortByCount` stage has a limit of 100 megabytes of RAM.
+By default, if the stage exceeds this limit, :pipeline:`$sortByCount`
+returns an error. To allow more space for stage processing, use the
+:ref:`allowDiskUse <aggregate-cmd-allowDiskUse>` option to enable
+aggregation pipeline stages to write data to temporary files.
+
+.. seealso::
+
+   :doc:`/core/aggregation-pipeline-limits`
+
 Behavior
 --------
-       
+
 The :pipeline:`$sortByCount` stage is equivalent to the
 following :pipeline:`$group` + :pipeline:`$sort` sequence: