From 9e12c1a6a225407df3565c6785c47552aa8f2884 Mon Sep 17 00:00:00 2001 From: Naomi Pentrel <5212232+npentrel@users.noreply.github.com> Date: Wed, 24 Mar 2021 12:34:22 +0100 Subject: [PATCH] DOCSP-15254 clarify agg pipeline 100mb limit --- source/core/aggregation-pipeline-limits.txt | 19 ++++---- source/includes/fact-agg-memory-limit.rst | 45 +++++++++++++------ source/reference/command/aggregate.txt | 14 +++--- .../method/db.collection.aggregate.txt | 29 +++++++----- .../reference/operator/aggregation/bucket.txt | 18 ++++++++ .../operator/aggregation/bucketAuto.txt | 17 +++++++ .../reference/operator/aggregation/group.txt | 17 +++---- .../reference/operator/aggregation/sort.txt | 29 +++++++----- .../operator/aggregation/sortByCount.txt | 20 ++++++++- 9 files changed, 145 insertions(+), 63 deletions(-) diff --git a/source/core/aggregation-pipeline-limits.txt b/source/core/aggregation-pipeline-limits.txt index 38d22b64257..258a560b01c 100644 --- a/source/core/aggregation-pipeline-limits.txt +++ b/source/core/aggregation-pipeline-limits.txt @@ -21,16 +21,15 @@ Result Size Restrictions MongoDB 3.6 removes the option for the :dbcommand:`aggregate` command to return its results as a single document. -The :dbcommand:`aggregate` command can return -either a cursor or store the results in a collection. When returning a -cursor or storing the results in a collection, each document in the -result set is subject to the :limit:`BSON Document Size` limit, -currently 16 megabytes; if any single document that exceeds the -:limit:`BSON Document Size` limit, the command will produce an error. -The limit only applies to the returned documents; during the pipeline -processing, the documents may exceed this size. The -:method:`db.collection.aggregate()` method returns a cursor by default. - +The :dbcommand:`aggregate` command can either return a cursor or store +the results in a collection. When returning a cursor or storing the +results in a collection, each document in the result set is subject to +the :limit:`BSON Document Size` limit, currently 16 megabytes; if any +single document exceeds the :limit:`BSON Document Size` limit, the +command produces an error. The limit only applies to the returned +documents; during the pipeline processing, the documents may exceed this +size. The :method:`db.collection.aggregate()` method returns a cursor by +default. .. _agg-memory-restrictions: diff --git a/source/includes/fact-agg-memory-limit.rst b/source/includes/fact-agg-memory-limit.rst index 9908db74e55..53419552387 100644 --- a/source/includes/fact-agg-memory-limit.rst +++ b/source/includes/fact-agg-memory-limit.rst @@ -1,23 +1,40 @@ -.. For any pipeline stage that has a memory limit, the operation - will produce an error if exceeds its memory limit. Currently, only - $sort and $group have a limit. - .. FYI -- 2.5.3 introduced the limit to $group and changed the limit for $sort from 10% to 100 MB. -Pipeline stages have a limit of 100 megabytes of RAM. If a stage -exceeds this limit, MongoDB will produce an error. To allow for the -handling of large datasets, use the ``allowDiskUse`` option to enable -aggregation pipeline stages to write data to temporary files. +Each individual pipeline stage has a limit of 100 megabytes of RAM. By +default, if a stage exceeds this limit, MongoDB produces an error. For +some pipeline stages you can allow pipeline processing to take up more +space by using the :ref:`allowDiskUse ` +option to enable aggregation pipeline stages to write data to temporary +files. -.. versionchanged:: 3.4 +Examples of stages that can spill to disk when :ref:`allowDiskUse +` is ``true`` are: -.. include:: /includes/fact-graphlookup-memory-restrictions.rst +- :pipeline:`$bucket` +- :pipeline:`$bucketAuto` +- :pipeline:`$group` +- :pipeline:`$sort` when the sort operation is not supported by an + index +- :pipeline:`$sortByCount` -.. include:: /includes/extracts/4.2-changes-usedDisk.rst +.. note:: + + Pipeline stages operate on streams of documents with each pipeline + stage taking in documents, processing them, and then outputing the + resulting documents. -.. seealso:: + Some stages can't output any documents until they have processed all + incoming documents. These pipeline stages must keep their stage + output in RAM until all incoming documents are processed. As a + result, these pipeline stages may require more space than the 100 MB + limit. - - :ref:`sort-memory-limit` - - :ref:`group-memory-limit` +If the results of one of your :pipeline:`$sort` pipeline stages exceed +the limit, consider :ref:`adding a $limit stage `. +.. versionchanged:: 3.4 + + .. include:: /includes/fact-graphlookup-memory-restrictions.rst + +.. include:: /includes/extracts/4.2-changes-usedDisk.rst diff --git a/source/reference/command/aggregate.txt b/source/reference/command/aggregate.txt index 3cd8b7846b9..22009b7173d 100644 --- a/source/reference/command/aggregate.txt +++ b/source/reference/command/aggregate.txt @@ -407,15 +407,16 @@ to ``true`` to return information about the aggregation operation. Aggregate Data using External Sort ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Aggregation pipeline stages have :ref:`maximum memory use limit -`. To handle large datasets, set -``allowDiskUse`` option to ``true`` to enable writing data to -temporary files, as in the following example: +Each individual pipeline stage has :ref:`a limit of 100 megabytes of RAM +`. By default, if a stage exceeds this limit, +MongoDB produces an error. To allow pipeline processing to take up +more space, set the :ref:`allowDiskUse ` +option to ``true`` to enable writing data to temporary files, as in the +following example: .. code-block:: javascript db.stocks.aggregate( [ - { $project : { cusip: 1, date: 1, price: 1, _id: 0 } }, { $sort : { cusip : 1, date: 1 } } ], { allowDiskUse: true } @@ -425,7 +426,8 @@ temporary files, as in the following example: .. seealso:: - :method:`db.collection.aggregate()` + - :method:`db.collection.aggregate()` + - :doc:`/core/aggregation-pipeline-limits` Aggregate Data Specifying Batch Size diff --git a/source/reference/method/db.collection.aggregate.txt b/source/reference/method/db.collection.aggregate.txt index 186f9a14b74..fee3f1df269 100644 --- a/source/reference/method/db.collection.aggregate.txt +++ b/source/reference/method/db.collection.aggregate.txt @@ -395,25 +395,30 @@ You can view more verbose explain output by passing the Perform Large Sort Operation with External Sort ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Aggregation pipeline stages have :ref:`maximum memory use limit -`. To handle large datasets, set -``allowDiskUse`` option to ``true`` to enable writing data to -temporary files, as in the following example: +Each individual pipeline stage has :ref:`a limit of 100 megabytes of RAM +`. By default, if a stage exceeds this limit, +MongoDB produces an error. To allow pipeline processing to take up +more space, set the :ref:`allowDiskUse ` +option to ``true`` to enable writing data to temporary files, as in the +following example: .. code-block:: javascript var results = db.stocks.aggregate( - [ - { $project : { cusip: 1, date: 1, price: 1, _id: 0 } }, - { $sort : { cusip : 1, date: 1 } } - ], - { - allowDiskUse: true - } - ) + [ + { $sort : { cusip : 1, date: 1 } } + ], + { + allowDiskUse: true + } + ) .. include:: /includes/extracts/4.2-changes-usedDisk.rst +.. seealso:: + + :doc:`/core/aggregation-pipeline-limits` + .. _example-aggregate-method-initial-batch-size: Specify an Initial Batch Size diff --git a/source/reference/operator/aggregation/bucket.txt b/source/reference/operator/aggregation/bucket.txt index 08e0f4ec468..473039e8442 100644 --- a/source/reference/operator/aggregation/bucket.txt +++ b/source/reference/operator/aggregation/bucket.txt @@ -27,6 +27,24 @@ Definition :pipeline:`$bucket` only produces output documents for buckets that contain at least one input document. +Considerations +-------------- + +.. _bucket-memory-limit: + +``$bucket`` and Memory Restrictions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :pipeline:`$bucket` stage has a limit of 100 megabytes of RAM. By +default, if the stage exceeds this limit, :pipeline:`$bucket` returns an +error. To allow more space for stage processing, use the +:ref:`allowDiskUse ` option to enable +aggregation pipeline stages to write data to temporary files. + +.. seealso:: + + :doc:`/core/aggregation-pipeline-limits` + Syntax ------ diff --git a/source/reference/operator/aggregation/bucketAuto.txt b/source/reference/operator/aggregation/bucketAuto.txt index e13c9c92d72..2b4d99b6818 100644 --- a/source/reference/operator/aggregation/bucketAuto.txt +++ b/source/reference/operator/aggregation/bucketAuto.txt @@ -160,6 +160,23 @@ Definition +Considerations +-------------- + +.. _bucketauto-memory-limit: + +``$bucketAuto`` and Memory Restrictions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :pipeline:`$bucketAuto` stage has a limit of 100 megabytes of RAM. +By default, if the stage exceeds this limit, :pipeline:`$bucketAuto` +returns an error. To allow more space for stage processing, use the +use the :ref:`allowDiskUse ` option to +enable aggregation pipeline stages to write data to temporary files. + +.. seealso:: + + :doc:`/core/aggregation-pipeline-limits` Behavior -------- diff --git a/source/reference/operator/aggregation/group.txt b/source/reference/operator/aggregation/group.txt index fd1f79b54cb..e8bffda70de 100644 --- a/source/reference/operator/aggregation/group.txt +++ b/source/reference/operator/aggregation/group.txt @@ -74,17 +74,18 @@ operators: .. _group-memory-limit: -``$group`` Operator and Memory -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +``$group`` and Memory Restrictions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :pipeline:`$group` stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, :pipeline:`$group` returns an -error. To allow for the handling of large datasets, set the -:method:`allowDiskUse ` option to -``true``. This flag enables :pipeline:`$group` operations to write to -temporary files. For more information, see the -:method:`db.collection.aggregate()` method and the -:dbcommand:`aggregate` command. +error. To allow more space for stage processing, use the +:ref:`allowDiskUse ` option to enable +aggregation pipeline stages to write data to temporary files. + +.. seealso:: + + :doc:`/core/aggregation-pipeline-limits` .. _group-pipeline-optimization: diff --git a/source/reference/operator/aggregation/sort.txt b/source/reference/operator/aggregation/sort.txt index ed6c63124d9..2ee33bb4140 100644 --- a/source/reference/operator/aggregation/sort.txt +++ b/source/reference/operator/aggregation/sort.txt @@ -202,23 +202,28 @@ documents. See :expression:`$meta` for more information. ``$sort`` and Memory Restrictions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The :pipeline:`$sort` stage has a limit of 100 megabytes of RAM. By -default, if the stage exceeds this limit, :pipeline:`$sort` will -produce an error. To allow for the handling of large datasets, set the -``allowDiskUse`` option to ``true`` to enable :pipeline:`$sort` -operations to write to temporary files. See the ``allowDiskUse`` -option in :method:`db.collection.aggregate()` method and the -:dbcommand:`aggregate` command for details. +The :pipeline:`$sort` stage has a limit of 100 megabytes of RAM for +in-memory sorts. By default, if the stage exceeds this limit, +:pipeline:`$sort` produces an error. To allow pipeline processing to +take up more space, use the :ref:`allowDiskUse +` option to enable aggregation pipeline +stages to write data to temporary files. + +.. seealso:: + + :doc:`/core/aggregation-pipeline-limits` ``$sort`` Operator and Performance ---------------------------------- -:pipeline:`$sort` operator can take advantage of an index as long as it -is not preceded by a :pipeline:`$project`, :pipeline:`$unwind`, or -:pipeline:`$group` stage. +The :pipeline:`$sort` operator can take advantage of an index if it's +used in the first stage of a pipeline or if it's only preceeded by a +:pipeline:`$match` stage. -.. todo:: if a sort precedes the first $group in a sharded system, - all documents must go to the mongos for sorting. +When you use the :pipeline:`$sort` on a sharded cluster, each shard +sorts its result documents using an index where available. Then the +:binary:`~bin.mongos` or one of the shards performs a streamed merge +sort. .. seealso:: diff --git a/source/reference/operator/aggregation/sortByCount.txt b/source/reference/operator/aggregation/sortByCount.txt index 51de2a661a5..5717cbe555e 100644 --- a/source/reference/operator/aggregation/sortByCount.txt +++ b/source/reference/operator/aggregation/sortByCount.txt @@ -78,9 +78,27 @@ Definition :doc:`/reference/bson-type-comparison-order/` +Considerations +-------------- + +.. _sortbycount-memory-limit: + +``$count`` and Memory Restrictions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The :pipeline:`$sortByCount` stage has a limit of 100 megabytes of RAM. +By default, if the stage exceeds this limit, :pipeline:`$sortByCount` +returns an error. To allow more space for stage processing, use the +:ref:`allowDiskUse ` option to enable +aggregation pipeline stages to write data to temporary files. + +.. seealso:: + + :doc:`/core/aggregation-pipeline-limits` + Behavior -------- - + The :pipeline:`$sortByCount` stage is equivalent to the following :pipeline:`$group` + :pipeline:`$sort` sequence: