From 04975c222c4435d2a15808c84874ab7a00081e59 Mon Sep 17 00:00:00 2001 From: kay Date: Wed, 21 Nov 2012 16:00:10 -0500 Subject: [PATCH 1/3] DOCS-686 port mapReduce page --- source/aggregation.txt | 1 + source/applications.txt | 1 + source/applications/map-reduce.txt | 288 ++++++++++++++++++ source/includes/examples-map-reduce.rst | 207 +++++++++++++ source/includes/parameters-map-reduce.rst | 253 +++++++++++++++ source/reference/command/mapReduce.txt | 186 ++++------- .../method/db.collection.mapReduce.txt | 218 +++---------- 7 files changed, 850 insertions(+), 304 deletions(-) create mode 100644 source/applications/map-reduce.txt create mode 100644 source/includes/examples-map-reduce.rst create mode 100644 source/includes/parameters-map-reduce.rst diff --git a/source/aggregation.txt b/source/aggregation.txt index f530f9278a5..94d93b595f2 100644 --- a/source/aggregation.txt +++ b/source/aggregation.txt @@ -25,4 +25,5 @@ The following is the outline of the aggregation documentation: applications/aggregation tutorial/aggregation-examples reference/aggregation + applications/map-reduce diff --git a/source/applications.txt b/source/applications.txt index 519a8958f99..3e24af2abb6 100644 --- a/source/applications.txt +++ b/source/applications.txt @@ -39,6 +39,7 @@ The following documents outline basic application development topics: - :doc:`/applications/replication` - :doc:`/applications/indexes` - :doc:`/applications/aggregation` + - :doc:`/applications/map-reduce` .. _application-patterns: diff --git a/source/applications/map-reduce.txt b/source/applications/map-reduce.txt new file mode 100644 index 00000000000..04b0b2109cb --- /dev/null +++ b/source/applications/map-reduce.txt @@ -0,0 +1,288 @@ +========== +Map-Reduce +========== + +.. default-domain:: mongodb + +Map-reduce operations can handle complex aggregation +tasks. [#simple-aggregation-use-framework]_ To perform map-reduce operations, +MongoDB provides the :dbcommand:`mapReduce` command and, in the +:program:`mongo` shell, the wrapper :method:`db.collection.mapReduce()` +method. + +This overview will cover: + +- :ref:`map-reduce-method` + +- :ref:`map-reduce-examples` + +- :ref:`map-reduce-incremental` + +- :ref:`map-reduce-sharded-cluster` + +- :ref:`map-reduce-additional-references` + +.. _map-reduce-method: + +mapReduce() +----------- + +.. include:: /reference/method/db.collection.mapReduce.txt + :start-after: mongodb + :end-before: mapReduce-syntax-end + +.. _map-reduce-examples: + +Map-Reduce Examples +------------------- +.. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-examples-begin + :end-before: map-reduce-sum-price-wrapper-end + +.. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-sum-price-cmd-end + :end-before: map-reduce-item-counts-avg-wrapper-end + +.. _map-reduce-incremental: + +Incremental Map-Reduce +---------------------- + +If the map-reduce dataset is constantly growing, then rather than +performing the map-reduce operation over the entire dataset each time +you want to run map-reduce, you may want to perform an incremental +map-reduce. + +To perform incremental map-reduce: + +#. Run a map-reduce job over the current collection and output the + result to a separate collection. + +#. When you have more data to process, run subsequent map-reduce job + with: + + - the ```` parameter that specifies conditions that match + *only* the new documents. + + - the ```` parameter that specifies the ``reduce`` action to + merge the new results into the existing output collection. + +Consider the following example where you schedule a map-reduce +operation on a ``sessions`` collection to run at the end of each day. + +**Data Setup** + +The ``sessions`` collection contains documents that log users' session +each day and can be simulated as follows: + +.. code-block:: javascript + + db.sessions.save( { userid: "a", ts: ISODate('2011-11-03 14:17:00'), length: 95 } ); + db.sessions.save( { userid: "b", ts: ISODate('2011-11-03 14:23:00'), length: 110 } ); + db.sessions.save( { userid: "c", ts: ISODate('2011-11-03 15:02:00'), length: 120 } ); + db.sessions.save( { userid: "d", ts: ISODate('2011-11-03 16:45:00'), length: 45 } ); + + db.sessions.save( { userid: "a", ts: ISODate('2011-11-04 11:05:00'), length: 105 } ); + db.sessions.save( { userid: "b", ts: ISODate('2011-11-04 13:14:00'), length: 120 } ); + db.sessions.save( { userid: "c", ts: ISODate('2011-11-04 17:00:00'), length: 130 } ); + db.sessions.save( { userid: "d", ts: ISODate('2011-11-04 15:37:00'), length: 65 } ); + +**Initial Map-Reduce of Current Collection** + +#. Define the ```` function that maps the ``userid`` to an + object that contains the fields ``userid``, ``total_time``, ``count``, + and ``avg_time``: + + .. code-block:: javascript + + var mapFunction = function() { + var key = this.userid; + var value = { + userid: this.userid, + total_time: this.length, + count: 1, + avg_time: 0 + }; + + emit( key, value ); + }; + +#. Define the corresponding ```` function with two arguments + ``key`` and ``values`` to calculate the total time and the count. + The ``key`` corresponds to the ``userid``, and the ``values`` is an + array whose elements corresponds to the individual objects mapped to the + ``userid`` in the ``mapFunction``. + + .. code-block:: javascript + + var reduceFunction = function(key, values) { + + var reducedObject = { + userid: key, + total_time: 0, + count:0, + avg_time:0 + }; + + values.forEach( function(value) { + reducedObject.total_time += value.total_time; + reducedObject.count += value.count; + } + ); + return reducedObject; + }; + +#. Define ```` function with two arguments ``key`` and + ``reducedValue``. The function modifies the ``reducedValue`` document + to add another field ``average`` and returns the modified document. + + .. code-block:: javascript + + var finalizeFunction = function (key, reducedValue) { + + if (reducedValue.count > 0) + reducedValue.avg_time = reducedValue.total_time / reducedValue.count; + + return reducedValue; + }; + +#. Perform map-reduce on the ``session`` collection using the + ``mapFunction``, the ``reduceFunction``, and the + ``finalizeFunction`` functions. Output the results to a collection + ``session_stat``. If the ``session_stat`` collection already exists, + the operation will replace the contents: + + .. code-block:: javascript + + db.runCommand( + { + mapreduce: "sessions", + map: mapFunction, + reduce:reduceFunction, + out: { reduce: "session_stat" }, + finalize: finalizeFunction + } + ); + +**Subsequent Incremental Map-Reduce** + +Assume the next day, the ``sessions`` collection grows by the following documents: + + .. code-block:: javascript + + db.session.save( { userid: "a", ts: ISODate('2011-11-05 14:17:00'), length: 100 } ); + db.session.save( { userid: "b", ts: ISODate('2011-11-05 14:23:00'), length: 115 } ); + db.session.save( { userid: "c", ts: ISODate('2011-11-05 15:02:00'), length: 125 } ); + db.session.save( { userid: "d", ts: ISODate('2011-11-05 16:45:00'), length: 55 } ); + +5. At the end of the day, perform incremental map-reduce on the + ``sessions`` collection but use the ``query`` field to select only the + new documents. Output the results to the collection ``session_stat``, + but ``reduce`` the contents with the results of the incremental + map-reduce: + + .. code-block:: javascript + + db.runCommand( { + mapreduce: "sessions", + map: mapFunction, + reduce:reduceFunction, + query: { ts: { $gt: ISODate('2011-11-05 00:00:00') } }, + out: { reduce: "session_stat" }, + finalize:finalizeFunction + } + ); + +.. _map-reduce-temporay-collection: + +Temporary Collection +-------------------- + +The map-reduce operation uses a temporary collection during processing. +At completion, the temporary collection will be renamed to the +permanent name atomically. Thus, one can perform a map-reduce operation +periodically with the same target collection name without worrying +about a temporary state of incomplete data. This is very useful when +generating statistical output collections on a regular basis. + +.. _map-reduce-sharded-cluster: + +Sharded Cluster +--------------- + +Sharded Input +~~~~~~~~~~~~~ + +If the input collection is sharded, :program:`mongos` will +automatically dispatch the map-reduce job to each shard to be executed +in parallel. There is no special option required. :program:`mongos` +will wait for jobs on all shards to finish. + +Sharded Output +~~~~~~~~~~~~~~ + +By default the output collection will not be sharded. The process is: + +- :program:`mongos` dispatches a map-reduce finish job to the shard + that will store the target collection. + +- The target shard will pull results from all other shards, run a final + reduce/finalize, and write to the output. + +- If using the sharded option in the ```` parameter, the output will be + sharded using ``_id`` as the shard key. + +.. versionchanged:: 2.2 + +- If the output collection does not exist, the collection is created + and sharded on the ``_id`` field. Even if empty, its initial chunks + are created based on the result of the first step of the map-reduce + operation. + +- :program:`mongos` dispatches, in parallel, a map-reduce finish job + to every shard that owns a chunk. + +- Each shard will pull the results it owns from all other shards, run a + final reduce/finalize, and write to the output collection. + +.. note:: + + - During additional map-reduce jobs, chunk splitting will be done as needed. + + - Balancing of chunks for the output collection is automatically + prevented during post-processing to avoid concurrency issues. + +Prior to version 2.1: + +- :program:`mongos` retrieves the results from each shard, doing a + merge sort to order the results, and performs a reduce/finalize as + needed. :program:`mongos` then writes the result to the output + collection in sharded mode. + +- Only a small amount of memory is required even for large datasets. + +- Shard chunks do not get automatically split and migrated during + insertion. Manual intervention is required until the chunks are + granular and balanced. + +.. warning:: + + Sharded output for mapreduce has been overhauled in v2.2. Its use in + earlier versions is not recommended. + +.. _map-reduce-additional-references: + +Additional References +--------------------- + +.. seealso:: + + - :wiki:`Map-Reduce Concurrency + ` + + - `MapReduce, Geospatial Indexes, and Other Cool Features `_ - Kristina Chodorow at MongoSF (April 2010) + + - :wiki:`Troubleshooting MapReduce` + +.. [#simple-aggregation-use-framework] For many simple aggregation tasks, see the + :doc:`aggregation framework `. diff --git a/source/includes/examples-map-reduce.rst b/source/includes/examples-map-reduce.rst new file mode 100644 index 00000000000..edcf5605086 --- /dev/null +++ b/source/includes/examples-map-reduce.rst @@ -0,0 +1,207 @@ +Map-Reduce Examples +------------------- + +.. map-reduce-examples-begin + +Consider the following map-reduce operations on a collection ``orders`` +that contains documents of the following prototype: + +.. code-block:: javascript + + { + _id: ObjectId("50a8240b927d5d8b5891743c"), + cust_id: "abc123", + ord_date: new Date("Oct 04, 2012"), + status: 'A', + price: 250, + items: [ { sku: "mmm", qty: 5, price: 2.5 }, + { sku: "nnn", qty: 5, price: 2.5 } ] + } + +.. map-reduce-document-prototype-end + +Sum the Price Per Customer Id +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. map-reduce-sum-price + +Perform map-reduce operation on the ``orders`` collection to group by +the ``cust_id``, and for each ``cust_id``, calculate the sum of the +``price`` for each ``cust_id``: + +#. Define the ```` function to process each document in the + map-reduce process: + + - In the function, ``this`` refers to the document currently being + processed. + + - The function maps the ``price`` to the ``cust_id`` for each + document and emits the ``cust_id`` and ``price`` pair. + + .. code-block:: javascript + + var mapFunction1 = function() { + emit(this.cust_id, this.price); + }; + +#. Define the corresponding ```` function with two arguments + ``keyCustId`` and ``valuesPrices``: + + - The ``valuesPrices`` is an array whose elements are the ``price`` + values mapped to the particular ``keyCustId`` by the ```` + function. + + - The function reduces the ``valuesPrice`` array to the + sum of its elements. + + .. code-block:: javascript + + var reduceFunction1 = function(keyCustId, valuesPrices) { + return Array.sum(valuesPrices); + }; + +#. Perform map-reduce on all documents in the ``orders`` collection + using the ``mapFunction1`` function and the ``reduceFunction1`` + function. Output the results to a collection ``map_reduce_example``. + If the ``map_reduce_example`` collection already exists, the + operation will replace the contents with the results of this + map-reduce operation: + + .. map-reduce-sum-price-wrapper-begin + .. code-block:: javascript + + db.orders.mapReduce( + mapFunction1, + reduceFunction1, + { out: "map_reduce_example" } + ) + + .. map-reduce-sum-price-wrapper-end + .. map-reduce-sum-price-cmd-begin + .. code-block:: javascript + + db.runCommand( + { + mapreduce: 'orders', + map: mapFunction1, + reduce: reduceFunction1, + out: 'map_reduce_example' + } + ) + .. map-reduce-sum-price-cmd-end + +Calculate the Number of Orders, Total Quantity, and Average Quantity Per Item +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. map-reduce-item-counts + +Perform map-reduce operation on the ``orders`` collection to group by +the item sku, and for each sku, calculate the number of orders and the +total quantity ordered. Finally, calculate the average quantity per +order for each sku. Process only the documents with ``ord_date`` +greater than ``01/01/2012`` for the map-reduce: + +#. Define the ```` function to process each document in the + map-reduce process: + + - In the function, ``this`` refers to the document currently being + processed. + + - For each item, the function associates the ``sku`` with a new + object ``value`` that contains the ``count`` of ``1`` and the + item ``qty`` for the order and emits the ``sku`` and ``value`` pair. + + .. code-block:: javascript + + var mapFunction2 = function() { + for (var idx = 0; idx < this.items.length; idx++) { + var key = this.items[idx].sku; + var value = { + count: 1, + qty: this.items[idx].qty + }; + emit(key, value); + } + }; + +#. Define the corresponding ```` function with two arguments + ``keySKU`` and ``valuesCountObjects``: + + - ``valuesCountObjects`` is an array whose elements are the objects + mapped to the particular ``keySKU`` by the ```` function. + + - The function reduces the ``valuesCountObjects`` array to a single + object ``reducedValue`` that also contains the ``count`` and the + ``qty`` fields. + + - In ``reducedValue``, the ``count`` field contains the sum of the + ``count`` fields from the individual array elements, and the + ``qty`` field contains the sum of the ``qty`` fields from the + individual array elements. + + .. code-block:: javascript + + var reduceFunction2 = function(keySKU, valuesCountObjects) { + reducedValue = { count: 0, qty: 0 }; + + for (var idx = 0; idx < valuesCountObjects.length; idx++) { + reducedValue.count += valuesCountObjects[idx].count; + reducedValue.qty += valuesCountObjects[idx].qty; + } + + return reducedValue; + }; + +#. Define ```` function with two arguments ``key`` and + ``reducedValue``. The function modifies the ``reducedValue`` object + to add another field ``average`` and returns the modified object. + + .. code-block:: javascript + + var finalizeFunction2 = function (key, reducedValue) { + + reducedValue.average = reducedValue.qty/reducedValue.count; + + return reducedValue; + }; + +#. Perform map-reduce on the ``orders`` collection using the + ``mapFunction2``, the ``reduceFunction2``, and the + ``finalizeFunction2`` functions. Use the ``query`` field to select + only those documents with ``ord_date`` greater than ``new + Date(01/01/2012)``. Output the results to a collection + ``map_reduce_example``. If the ``map_reduce_example`` collection + already exists, the operation will merge the existing contents with + the results of this map-reduce operation: + + .. map-reduce-item-counts-avg-end + .. map-reduce-item-counts-avg-wrapper-begin + .. code-block:: javascript + + db.orders.mapReduce( mapFunction2, + reduceFunction2, + { + out: { merge: "map_reduce_example" }, + query: { ord_date: { $gt: new Date('01/01/2012') } }, + finalize: finalizeFunction2 + } + ) + + .. map-reduce-item-counts-avg-wrapper-end + .. map-reduce-item-counts-avg-cmd-begin + .. code-block:: javascript + + db.runCommand( + { + mapreduce: 'orders', + map: mapFunction2, + reduce: reduceFunction2, + finalize: finalizeFunction2, + out: { merge: "map_reduce_example" }, + query: { ord_date: { $gt: new Date('01/01/2012') } }, + } + ) + + .. map-reduce-item-counts-avg-cmd-end + +.. map-reduce-examples-end diff --git a/source/includes/parameters-map-reduce.rst b/source/includes/parameters-map-reduce.rst new file mode 100644 index 00000000000..741f69c402f --- /dev/null +++ b/source/includes/parameters-map-reduce.rst @@ -0,0 +1,253 @@ +:param map: + + A JavaScript function that associates or "maps" a value with a + key. + + The ```` function has the following prototype: + + .. code-block:: javascript + + function() { + ... + emit(key, value); + } + + The ```` function is applied to each document that is + selected for the map-reduce operation. All the emitted ``key`` + and ``value`` pairs from the "map" step will be grouped by + ``key`` and passed to the ``reduce`` function. + + .. note:: + + - Each document processed is referenced by ``this`` within + the function. + + - The ```` function should *not* access the database, + even to perform read operations. + + - The ```` function should *not* affect the outside + system. + + - The ``emit(key,value)`` function associates the ``key`` + with a ``value``. + + - Each "emit" is limited to half the MongoDB :limit:`BSON + document size`. + + - There is no limit to the number of times you may call the + ``emit`` function per document. + + - The ```` function can access the variables defined in + the ```` parameter if the ```` parameter is + defined. + +:param reduce: + + A JavaScript function that "reduces" to a single object all the + ``values`` associated with a particular ``key``. + + The ```` function has the following prototype: + + .. code-block:: javascript + + function(key, values) { + ... + return result; + } + + The ```` function accepts ``key`` and ``values`` + arguments. The elements of the ``values`` array are the + individual ``value`` objects emitted by the ```` function, + grouped by the ``key``. + + .. note:: + + - The ```` function should *not* access the database, + even to perform read operations. + + - The ```` function should *not* affect the outside + system. + + - Because it is possible to invoke the ```` function + more than once for the same key, the *type* of the return + object must be **identical** to the type of the ``value`` + emitted by the ```` function. + + - The ```` function can access the variables defined + in the ```` parameter if the ```` parameter + is defined for the map-reduce operation. + +:param out: + + Specifies the location of the result of the map-reduce operation. + + .. versionadded: 1.8 + + You can specify the following options for the ```` parameter: + + - **Output to a collection**. This option is not available on + secondary members of replica sets. + + .. code-block:: javascript + + { out: } + + - **Output to a collection and specify ````** if the + output collection already exists. This option is not available + on secondary members of replica sets. + + .. code-block:: none + + { out: { : [, db: ][, sharded: ][, nonAtomic: ] } } + + - ````: Specify one of the following actions: + + - ``replace`` + + .. code-block:: none + + { out: { replace: } } + + Replace the contents of the ```` if the + collection with the ```` exists. + + - ``merge`` + + .. code-block:: none + + { out: { merge: } } + + Merge the new result with the existing result if the + output collection already exists. If an existing document + has the same key as the new result, *overwrite* that + existing document. + + - ``reduce`` + + .. code-block:: none + + { out: { reduce: } } + + Merge the new result with the existing result if the + output collection already exists. If an existing document + has the same key as the new result, apply the ```` + function to both the new and the existing documents and + overwrite the existing document with the result. + + - ``db``: + + Optional.The name of the database that you want the + map-reduce operation to write its output. By default + this will be the same database as the input collection. + + - ``sharded``: + + Optional. If ``true`` *and* the output database is + enabled for sharding, the map-reduce operation will + shard the output collection using the ``_id`` field as + the shard key. + + - ``nonAtomic``: + + .. versionadded:: 2.1 + + Optional. Specify output operation as non-atomic. If + ``true``, the post processing step will not execute + inside of a database lock so that partial results will + be visible during processing . ``nonAtomic`` is valid + *only* for ``merge`` and ``reduce`` output operations + where post-processing may be a long-running operation. + + - **Output inline**. Perform the map-reduce operation in memory + and return the result. This option is the only available + option for ``out`` on secondary members of replica sets. + + .. code-block:: javascript + + { out: { inline: 1 } } + + The result must fit within the :ref:`maximum size of a BSON + document `. + +:param query: + + Optional. Specifies the selection criteria using :doc:`query + operators ` for determining the documents + input to the ```` function. + +:param sort: + + Optional. Sorts the *input* documents. This option is useful for + optimization. For example, specify the sort key to be the same + as the emit key so that there are fewer reduce operations. + +:param limit: + + Optional. Specifies a maximum number of documents to return from + the collection. + +:param finalize: + + Optional. A JavaScript function that follows the ```` + method and modifies the output and has the following prototype: + + .. code-block:: javascript + + function(key, reducedValue) { + ... + return modifiedObject; + } + + The ```` function receives as its arguments a ``key`` + value and the ``reducedValue`` from the ```` function. + + .. note:: + + - The function should *not* access the database, even to + perform read operations. + + - The function should *not* affect the outside system. + + - If the ```` parameter is defined for the map-reduce + operation, the ```` function can access the variables + defined in the ```` parameter. + +:param document scope: + + Optional. Specifies global variables that are accessible in the + ```` , ```` and the ```` functions. + +:param Boolean jsMode: + + .. versionadded: 2.0 + + Optional. Specifies whether to convert intermediate data into + BSON format between the execution of the ```` and ```` + functions. + + If ``false``: + + - Internally, the JavaScript objects emitted during ```` + function execution are converted to BSON objects. These BSON + objects are then converted back to JavaScript objects when + calling the ```` function. + + - The map-reduce operation places the intermediate BSON objects + in temporary, on-disk storage. This allows the map-reduce + operation to execute over arbitrarily large datasets. + + If ``true``: + + - Internally, the JavaScript objects emitted during ```` + function remain as JavaScript objects. There is no need to + convert the objects for the ```` function, which + can result in faster execution. + + - Can only work for result sets with less than 500,000 distinct + ``key`` arguments to the mapper's ``emit()`` function. + + The ```` defaults to true. + +:param Boolean verbose: + + Optional. Provides statistics on job execution times. diff --git a/source/reference/command/mapReduce.txt b/source/reference/command/mapReduce.txt index ecbdbdba18b..921754a72ee 100644 --- a/source/reference/command/mapReduce.txt +++ b/source/reference/command/mapReduce.txt @@ -7,140 +7,58 @@ mapReduce .. dbcommand:: mapReduce The :dbcommand:`mapReduce` command allows you to run - map-reduce-style aggregations over a collection. - - :option map: A JavaScript function that performs the "map" step of - the map-reduce operation. This function references the - current input document and calls the - ``emit(key,value)`` method that supplies values to - the reduce function. Map functions may call - ``emit()``, once, more than once, or not at all - depending on the type of aggregation. - - :option reduce: A JavaScript function that performs the "reduce" - step of the MapReduce operation. The reduce - function receives an array of emitted values from - the map function, and returns a single - value. Because it's possible to invoke the reduce - function more than once for the same key, the - structure of the object returned by function must - be identical to the structure of the emitted - function. - - :option out: Specifies the location of the out of the reduce stage - of the operation. Specify a string to write the output - of the Map/Reduce job to a collection with that - name. The map-reduce operation will replace the - content of the specified collection in the current - database by default. See below for additional options. - - :option query: Optional. A query object, like the query used by the - :method:`db.collection.find()` method. Use this to - filter to limit the number of documents - enter the map phase of the aggregation. - - :option sort: Optional. Sorts the input objects using this key. This - option is useful for optimizing the - job. Common uses include sorting by the emit - key so that there are fewer reduces. - - :option limit: Optional. Species a maximum number of objects to - return from the collection. - - :option finalize: Optional. Specifies an optional "finalize" - function to run on a result, following - the reduce stage, to modify or control - the output of the :dbcommand:`mapReduce` - operation. - - :option scope: Optional. Place a :term:`document` as the contents of - this field, to place fields into the global - javascript scope. - - :option Boolean jsMode: Optional. The ``jsMode`` option defaults to - ``false``. - - :option Boolean verbose: Optional. The ``verbose`` option provides - statistics on job execution times. - - :dbcommand:`mapReduce` only require ``map`` and ``reduce`` options, - all other fields are optional. You must write all ``map`` and - ``reduce`` functions in JavaScript. - - The ``out`` field of the :dbcommand:`mapReduce`, provides a - number of additional configuration options that you may use to - control how MongoDB returns data from the map-reduce job. Consider - the following 4 output possibilities. - - .. versionadded: 1.8 - - :param replace: Optional. Specify a collection name (e.g. ``{ out: - { replace: collectionName } }``) where the output - of the map-reduce overwrites the contents of the - collection specified (i.e. ``collectionName``) if - there is any data in that collection. This is the - default behavior if you only specify a collection - name in the ``out`` field. - - :param merge: Optional. Specify a collection name (e.g. ``{ out: { - merge: collectionName } }``) where the - map-reduce operation writes output to an - existing collection - (i.e. ``collectionName``,) and only - overwrites existing documents when a new - document has the same key as an "old" - document in this collection. - - :param reduce: Optional. This operation behaves as the ``merge`` - option above, except that when an existing - document has the same key as a new - document, ``reduce`` function from the - map reduce job will run on both values and - MongoDB writes the result of this function - to the new collection. The specification - takes the form of ``{ out: { reduce: - collectionName } }``, where - ``collectionName`` is the name of the - results collection. - - :param inline: Optional. Indicate the inline option (i.e. ``{ out: - { inline: 1 } }``) to perform the map - reduce job in ram and return the results at - the end of the function. This option is - only possible when the entire result set - will fit within the :ref:`maximum size of a - BSON document `. - When performing map-reduce jobs on - secondary members of replica sets, this is - the only available option. - - :param db: Optional. The name of the database that you want the - map-reduce operation to write its output. By default - this will be the same database as the input collection. - - :param sharded: Optional. If ``true``, *and* the output mode writes to a - collection, and the output database has sharding - enabled, the map-reduce operation will shard the - results collection according to the ``_id`` field. - - :param nonAtomic: - - .. versionadded:: 2.1 + :term:`map-reduce` aggregations over a collection. + + .. code-block:: javascript + + db.runCommand( + { + mapreduce: '', + map: , + reduce: , + out: , + query: , + sort: , + limit: , + finalize: , + scope: , + jsMode: , + verbose: + + } + ) + + In addition to specifying the ```` over which to perform + the ``mapreduce`` command, the command accepts the following: - Optional. Specify output operation as non-atomic such that - the output behaves like a normal ``multi`` :method:`update() - `. If ``true``, the post processing step - will not execute inside of a database lock so that partial - results will be visible during processing . ``nonAtomic`` is - valid only for ``merge`` and ``reduce`` output operations - where post-processing may be a long-running operation. - - .. seealso:: ":method:`mapReduce()`" and ":term:`map-reduce`." - - Also, the ":wiki:`MapReduce` page, provides a greater overview - of MongoDB's map-reduce functionality. Consider the - ":wiki:`Simple application `" support for basic - aggregation operations and ":doc:`/applications/aggregation`" - for a more flexible approach to data aggregation in MongoDB. + .. include:: /includes/parameters-map-reduce.rst + + .. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-examples-begin + :end-before: map-reduce-document-prototype-end + + - .. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-sum-price + :end-before: map-reduce-sum-price-wrapper-begin + + .. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-sum-price-cmd-begin + :end-before: map-reduce-sum-price-cmd-end + + - .. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-item-counts + :end-before: map-reduce-item-counts-avg-end + + .. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-item-counts-avg-cmd-begin + :end-before: map-reduce-item-counts-avg-cmd-end + + .. seealso:: :method:`mapReduce()` and :term:`map-reduce`. + + The :doc:`Map-Reduce ` + provides a greater overview of MongoDB's map-reduce + functionality. Consider the :wiki:`Simple application + ` support for basic aggregation operations as well + as :doc:`/applications/aggregation`. .. slave-ok diff --git a/source/reference/method/db.collection.mapReduce.txt b/source/reference/method/db.collection.mapReduce.txt index 3ec2981e876..25d0c41dfff 100644 --- a/source/reference/method/db.collection.mapReduce.txt +++ b/source/reference/method/db.collection.mapReduce.txt @@ -4,176 +4,54 @@ db.collection.mapReduce() .. default-domain:: mongodb -.. method:: db.collection.mapReduce(map,reduce,out,[query],[sort],[limit],[finalize],[scope],[jsMode],[verbose]) - - The :method:`db.collection.mapReduce()` provides a wrapper around the - :dbcommand:`mapReduce` :term:`database command`. Always call the - :method:`db.collection.mapReduce()` method on a collection. The following - argument list specifies a :term:`document` with 3 required and - 8 optional fields: - - :param map: A JavaScript function that performs the "map" step of - the MapReduce operation. This function references the - current input document and calls the - ``emit(key,value)`` method to supply the value - argument to the reduce function, grouped by the key - argument. Map functions may call ``emit()``, once, more - than once, or not at all depending on the type of - aggregation. - - :param reduce: A JavaScript function that performs the "reduce" - step of the MapReduce operation. The reduce function - receives a key value and an array of emitted values - from the map function, and returns a single - value. Because it's possible to invoke the reduce - function more than once for the same key, the - structure of the object returned by function must be - identical to the structure of the emitted function. - - :param out: Specifies the location of the out of the reduce stage - of the operation. Specify a string to write the output - of the map-reduce job to a collection with that - name. The map-reduce operation will replace the content - of the specified collection in the current database by - default. See below for additional options. - - :param document query: Optional. A query object, like the query used by the - :method:`db.collection.find()` method. Use this to specify - which documents should enter the map phase - of the aggregation. - - :param sort: Optional. Sorts the input objects using this key. This - option is useful for optimizing the job. Common uses - include sorting by the emit key so that there are - fewer reduces. - - :param limit: Optional. Specifies a maximum number of objects to - return from the collection. - - :param finalize: Optional. Specifies an optional "finalize" function - to run on a result, following the reduce - stage, to modify or control the output of - the :method:`db.collection.mapReduce()` operation. - - :param scope: Optional. Place a :term:`document` as the contents of - this field, to place fields into the global - javascript scope for the execution of the - map-reduce command. - - - :param Boolean jsMode: Optional. Specifies whether to convert - intermediate data into BSON format between - the mapping and reducing steps. - - If false, map-reduce execution internally - converts the values emitted during the map - function from JavaScript objects into BSON - objects, and so must convert those BSON - objects into JavaScript objects when calling - the reduce function. When this argument is - false, :method:`db.collection.mapReduce()` - places the :term:`BSON` objects used for - intermediate values in temporary, on-disk - storage, allowing the map-reduce job to - execute over arbitrarily large data sets. - - If true, map-reduce execution retains the - values emitted by the map function and - returned as JavaScript objects, and so does - not need to do extra conversion work to call - the reduce function. When this argument is - true, the map-reduce job can execute faster, - but can only work for result sets with less - than 500K distinct key arguments to the - mapper's emit function. - - The ``jsMode`` option defaults to - true. - - .. versionadded: 2.0 - - :param Boolean verbose: Optional. The ``verbose`` option provides - statistics on job execution times. - - The ``out`` field of the :method:`db.collection.mapReduce()`, provides a - number of additional configuration options that you may use to - control how MongoDB returns data from the map-reduce job. Consider - the following 4 output possibilities. - - .. versionadded: 1.8 - - :param replace: Optional. Specify a collection name (e.g. ``{ out: - { replace: collectionName } }``) where the output - of the map-reduce overwrites the contents of the - collection specified (i.e. ``collectionName``) if - there is any data in that collection. This is the - default behavior if you only specify a collection - name in the ``out`` field. - - :param merge: Optional. Specify a collection name (e.g. ``{ out: { - merge: collectionName } }``) where the - map-reduce operation writes output to an - existing collection - (i.e. ``collectionName``,) and only - overwrites existing documents in the - collection when a new document has the same - key as a document that existed before the - map-reduce operation began. - - :param reduce: Optional. This operation behaves like the ``merge`` - option above, except that when an existing - document has the same key as a new - document, ``reduce`` function from the - map reduce job will run on both values and - MongoDB will write the result of this function - to the new collection. The specification - takes the form of ``{ out: { reduce: - collectionName } }``, where - ``collectionName`` is the name of the - results collection. - - :param inline: Optional. Indicate the inline option (i.e. ``{ out: - { inline: 1 } }``) to perform the map - reduce job in memory and return the results - at the end of the function. This option is - only possible when the entire result set - will fit within the :ref:`maximum size of a - BSON document `. - When performing map-reduce jobs on - secondary members of replica sets, this is - the only available ``out`` option. - - :param db: Optional. The name of the database that you want the - map-reduce operation to write its output. By default - this will be the same database as the input collection. - - :param sharded: Optional. If ``true``, *and* the output mode writes - to a collection, and the output database has - sharding enabled, the map-reduce operation will - shard the results collection according to the - ``_id`` field. - - :param nonAtomic: - - .. versionadded:: 2.1 - - Optional. Specify output operation as non-atomic such that - the output behaves like a normal ``multi`` :method:`update() - `. If ``true``, the post processing step - will not execute inside of a database lock so that partial - results will be visible during processing . ``nonAtomic`` is - valid only for ``merge`` and ``reduce`` output operations - where post-processing may be a long-running operation. - - .. seealso:: :term:`map-reduce`, provides a greater overview - of MongoDB's map-reduce functionality. - - Also consider ":doc:`/applications/aggregation`" for a more - flexible approach to data aggregation in MongoDB, and the - ":wiki:`Aggregation`" wiki page for an over view of aggregation - in MongoDB. - +.. method:: db.collection.mapReduce() + + The :method:`db.collection.mapReduce()` method provides a wrapper + around the :dbcommand:`mapReduce` command. + + .. code-block:: javascript + + db.collection.mapReduce( + , + , + { + , + , + , + , + , + , + , + + } + ) + + :method:`db.collection.mapReduce()` takes the following parameters: + + .. include:: /includes/parameters-map-reduce.rst + + .. mapReduce-syntax-end + + .. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-examples-begin + :end-before: map-reduce-document-prototype-end + + - .. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-sum-price + :end-before: map-reduce-sum-price-wrapper-end + + - .. include:: /includes/examples-map-reduce.rst + :start-after: map-reduce-item-counts + :end-before: map-reduce-item-counts-avg-wrapper-end + + .. seealso:: :term:`map-reduce` and :dbcommand:`mapReduce` + + The :doc:`Map-Reduce ` page + provides a greater overview of MongoDB's map-reduce + functionality, while the :doc:`/applications/aggregation` + provides an overview of the aggregation framework. + .. Consider .. STUB ":doc:`/applications/simple-aggregation` for simple aggregation - .. operations and ":doc:`/applications/aggregation`" for a more flexible + .. operations and :doc:`/applications/aggregation`" for a more flexible .. approach to data aggregation in MongoDB. From 822c172b08e92d1f5379248269eadd1121f07194 Mon Sep 17 00:00:00 2001 From: kay Date: Thu, 29 Nov 2012 12:49:16 -0500 Subject: [PATCH 2/3] DOCS-686 incorporate comments from antoine and add the troubleshoot page --- source/applications/map-reduce.txt | 110 ++++---- source/includes/examples-map-reduce.rst | 4 + source/includes/parameters-map-reduce.rst | 50 +++- source/tutorial/troubleshoot-map-reduce.txt | 283 ++++++++++++++++++++ 4 files changed, 382 insertions(+), 65 deletions(-) create mode 100644 source/tutorial/troubleshoot-map-reduce.txt diff --git a/source/applications/map-reduce.txt b/source/applications/map-reduce.txt index 04b0b2109cb..3ad95921f45 100644 --- a/source/applications/map-reduce.txt +++ b/source/applications/map-reduce.txt @@ -4,37 +4,47 @@ Map-Reduce .. default-domain:: mongodb -Map-reduce operations can handle complex aggregation -tasks. [#simple-aggregation-use-framework]_ To perform map-reduce operations, -MongoDB provides the :dbcommand:`mapReduce` command and, in the -:program:`mongo` shell, the wrapper :method:`db.collection.mapReduce()` -method. +Map-reduce operations can handle complex aggregation tasks. To perform +map-reduce operations, MongoDB provides the :dbcommand:`mapReduce` +command and, in the :program:`mongo` shell, the +:method:`db.collection.mapReduce()` wrapper method. -This overview will cover: +.. contents:: This overview will cover: + :backlinks: none + :local: + :depth: 1 -- :ref:`map-reduce-method` +For many simple aggregation tasks, see the :doc:`aggregation framework +`. -- :ref:`map-reduce-examples` - -- :ref:`map-reduce-incremental` - -- :ref:`map-reduce-sharded-cluster` +.. _map-reduce-examples: -- :ref:`map-reduce-additional-references` +Map-Reduce Examples +------------------- -.. _map-reduce-method: +This section provides some map-reduce examples in the :program:`mongo` +shell using the :method:`db.collection.mapReduce()` method: -mapReduce() ------------ +.. code-block:: javascript -.. include:: /reference/method/db.collection.mapReduce.txt - :start-after: mongodb - :end-before: mapReduce-syntax-end + db.collection.mapReduce( + , + , + { + , + , + , + , + , + , + , + + } + ) + +For more information on the parameters, see the +:method:`db.collection.mapReduce()` reference page . -.. _map-reduce-examples: - -Map-Reduce Examples -------------------- .. include:: /includes/examples-map-reduce.rst :start-after: map-reduce-examples-begin :end-before: map-reduce-sum-price-wrapper-end @@ -140,10 +150,10 @@ each day and can be simulated as follows: var finalizeFunction = function (key, reducedValue) { - if (reducedValue.count > 0) - reducedValue.avg_time = reducedValue.total_time / reducedValue.count; + if (reducedValue.count > 0) + reducedValue.avg_time = reducedValue.total_time / reducedValue.count; - return reducedValue; + return reducedValue; }; #. Perform map-reduce on the ``session`` collection using the @@ -154,15 +164,13 @@ each day and can be simulated as follows: .. code-block:: javascript - db.runCommand( - { - mapreduce: "sessions", - map: mapFunction, - reduce:reduceFunction, - out: { reduce: "session_stat" }, - finalize: finalizeFunction - } - ); + db.sessions.mapReduce( mapFunction, + reduceFunction, + { + out: { reduce: "session_stat" }, + finalize: finalizeFunction + } + ) **Subsequent Incremental Map-Reduce** @@ -170,10 +178,10 @@ Assume the next day, the ``sessions`` collection grows by the following document .. code-block:: javascript - db.session.save( { userid: "a", ts: ISODate('2011-11-05 14:17:00'), length: 100 } ); - db.session.save( { userid: "b", ts: ISODate('2011-11-05 14:23:00'), length: 115 } ); - db.session.save( { userid: "c", ts: ISODate('2011-11-05 15:02:00'), length: 125 } ); - db.session.save( { userid: "d", ts: ISODate('2011-11-05 16:45:00'), length: 55 } ); + db.sessions.save( { userid: "a", ts: ISODate('2011-11-05 14:17:00'), length: 100 } ); + db.sessions.save( { userid: "b", ts: ISODate('2011-11-05 14:23:00'), length: 115 } ); + db.sessions.save( { userid: "c", ts: ISODate('2011-11-05 15:02:00'), length: 125 } ); + db.sessions.save( { userid: "d", ts: ISODate('2011-11-05 16:45:00'), length: 55 } ); 5. At the end of the day, perform incremental map-reduce on the ``sessions`` collection but use the ``query`` field to select only the @@ -183,15 +191,14 @@ Assume the next day, the ``sessions`` collection grows by the following document .. code-block:: javascript - db.runCommand( { - mapreduce: "sessions", - map: mapFunction, - reduce:reduceFunction, - query: { ts: { $gt: ISODate('2011-11-05 00:00:00') } }, - out: { reduce: "session_stat" }, - finalize:finalizeFunction - } - ); + db.sessions.mapReduce( mapFunction, + reduceFunction, + { + query: { ts: { $gt: ISODate('2011-11-05 00:00:00') } }, + out: { reduce: "session_stat" }, + finalize: finalizeFunction + } + ); .. _map-reduce-temporay-collection: @@ -277,12 +284,9 @@ Additional References .. seealso:: + - :doc:`/tutorial/troubleshoot-map-reduce` + - :wiki:`Map-Reduce Concurrency ` - `MapReduce, Geospatial Indexes, and Other Cool Features `_ - Kristina Chodorow at MongoSF (April 2010) - - - :wiki:`Troubleshooting MapReduce` - -.. [#simple-aggregation-use-framework] For many simple aggregation tasks, see the - :doc:`aggregation framework `. diff --git a/source/includes/examples-map-reduce.rst b/source/includes/examples-map-reduce.rst index edcf5605086..a0a4c2a645a 100644 --- a/source/includes/examples-map-reduce.rst +++ b/source/includes/examples-map-reduce.rst @@ -29,6 +29,8 @@ Perform map-reduce operation on the ``orders`` collection to group by the ``cust_id``, and for each ``cust_id``, calculate the sum of the ``price`` for each ``cust_id``: + .. map-reduce-map-function-begin + #. Define the ```` function to process each document in the map-reduce process: @@ -44,6 +46,8 @@ the ``cust_id``, and for each ``cust_id``, calculate the sum of the emit(this.cust_id, this.price); }; + .. map-reduce-map-function-end + #. Define the corresponding ```` function with two arguments ``keyCustId`` and ``valuesPrices``: diff --git a/source/includes/parameters-map-reduce.rst b/source/includes/parameters-map-reduce.rst index 741f69c402f..db06bc4e4a5 100644 --- a/source/includes/parameters-map-reduce.rst +++ b/source/includes/parameters-map-reduce.rst @@ -31,8 +31,8 @@ - The ``emit(key,value)`` function associates the ``key`` with a ``value``. - - Each "emit" is limited to half the MongoDB :limit:`BSON - document size`. + - Each "emit" is limited to half the MongoDB :ref:`maximum + BSON document size `. - There is no limit to the number of times you may call the ``emit`` function per document. @@ -68,11 +68,32 @@ - The ```` function should *not* affect the outside system. + - The ```` function is *idempotent*; i.e. the + following behavior holds: + + .. code-block:: javascript + + reduce( key, [ reduce(key, valuesArray) ] ) == reduce ( key, valuesArray ) + + Additionally, the order of the elements in the + ``valuesArray`` does not affect the result: + + .. code-block:: javascript + + reduce ( key, [ A, B ] ) == reduce ( key, [ B, A ] ) + - Because it is possible to invoke the ```` function more than once for the same key, the *type* of the return object must be **identical** to the type of the ``value`` - emitted by the ```` function. - + emitted by the ```` function to ensure that: + + .. code-block:: javascript + + reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce (key, [ C, A, B ] ) + + - The ```` function is **not** invoked for a key + that has only a single value. + - The ```` function can access the variables defined in the ```` parameter if the ```` parameter is defined for the map-reduce operation. @@ -151,12 +172,15 @@ .. versionadded:: 2.1 - Optional. Specify output operation as non-atomic. If - ``true``, the post processing step will not execute - inside of a database lock so that partial results will - be visible during processing . ``nonAtomic`` is valid - *only* for ``merge`` and ``reduce`` output operations - where post-processing may be a long-running operation. + Optional. Specify output operation as non-atomic and is + valid *only* for ``merge`` and ``reduce`` output modes. + Post-processing for ``merge`` and ``reduce`` output + modes may take a long time (e.g. minutes). During this + time, the entire database is locked for both reads and + writes. If ``nonAtomic`` is ``true``, the post + processing step will prevent the locking of the + database; however, partial results will be visible as + they are processed. - **Output inline**. Perform the map-reduce operation in memory and return the result. This option is the only available @@ -246,8 +270,10 @@ - Can only work for result sets with less than 500,000 distinct ``key`` arguments to the mapper's ``emit()`` function. - The ```` defaults to true. + The ```` defaults to false. :param Boolean verbose: - Optional. Provides statistics on job execution times. + Optional. Specifies whether to include the ``timing`` + information in the result information. The ```` + defaults to ``true`` to include the ``timing`` information. diff --git a/source/tutorial/troubleshoot-map-reduce.txt b/source/tutorial/troubleshoot-map-reduce.txt new file mode 100644 index 00000000000..21b8fa6cea3 --- /dev/null +++ b/source/tutorial/troubleshoot-map-reduce.txt @@ -0,0 +1,283 @@ +======================= +Troubleshoot Map-Reduce +======================= + +.. default-domain:: mongodb + +You can troubleshoot the ``map`` function and the ``reduce`` function +in the :program:`mongo` shell. + +.. _troubleshoot-map-function: + +Troubleshoot the Map Function +----------------------------- + +You can verify the ``key`` and ``value`` pairs emitted by the ``map`` +function by writing your own ``emit`` function. + +Consider a collection ``orders`` that contains documents of the +following prototype: + +.. code-block:: javascript + + { + _id: ObjectId("50a8240b927d5d8b5891743c"), + cust_id: "abc123", + ord_date: new Date("Oct 04, 2012"), + status: 'A', + price: 250, + items: [ { sku: "mmm", qty: 5, price: 2.5 }, + { sku: "nnn", qty: 5, price: 2.5 } ] + } + +#. Define the ``map`` function that maps the ``price`` to the + ``cust_id`` for each document and emits the ``cust_id`` and ``price`` + pair: + + .. code-block:: javascript + + var map = function() { + emit(this.cust_id, this.price); + }; + +#. Define the ``emit`` function to print the key and value: + + .. code-block:: javascript + + var emit = function(key, value) { + print("emit"); + print("key: " + key + " value: " + tojson(value)); + } + +#. Invoke the ``map`` function with a single document from the ``orders`` + collection: + + .. code-block:: javascript + + var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } ); + map.apply(myDoc); + +#. Verify the key and value pair is as you expected. + + .. code-block:: javascript + + emit + key: abc123 value:250 + +#. Invoke the ``map`` function with multiple documents from the ``orders`` + collection: + + .. code-block:: javascript + + var myCursor = db.orders.find( { cust_id: "abc123" } ); + + while (myCursor.hasNext()) { + var doc = myCursor.next(); + print ("document _id= " + tojson(doc._id)); + map.apply(doc); + print(); + } + +#. Verify the key and value pairs are as you expected. + +.. _troubleshoot-reduce-function: + +Troubleshoot the Reduce Function +-------------------------------- + +Test Type +~~~~~~~~~ + +You can test that the ``reduce`` function returns a value that is the +same type as the value emitted from the ``map`` function. + +#. Define a ``reduceFunction1`` function that takes the arguments + ``keyCustId`` and ``valuesPrices``. ``valuesPrices`` is an array of + integers: + + .. code-block:: javascript + + var reduceFunction1 = function(keyCustId, valuesPrices) { + return Array.sum(valuesPrices); + }; + +#. Define a sample array of integers: + + .. code-block:: javascript + + var myTestValues = [ 5, 5, 10 ]; + +#. Invoke the ``reduceFunction1`` with ``myTestValues``: + + .. code-block:: javascript + + reduceFunction1('myKey', myTestValues); + +#. Verify the ``reduceFunction1`` returned an integer: + + .. code-block:: javascript + + 20 + +#. Define a ``reduceFunction2`` function that takes the arguments + ``keySKU`` and ``valuesCountObjects``. ``valuesCountObjects`` is an array of + documents that contain two fields ``count`` and ``qty``: + + .. code-block:: javascript + + var reduceFunction2 = function(keySKU, valuesCountObjects) { + reducedValue = { count: 0, qty: 0 }; + + for (var idx = 0; idx < valuesCountObjects.length; idx++) { + reducedValue.count += valuesCountObjects[idx].count; + reducedValue.qty += valuesCountObjects[idx].qty; + } + + return reducedValue; + }; + +#. Define a sample array of documents: + + .. code-block:: javascript + + var myTestObjects = [ + { count: 1, qty: 5 }, + { count: 2, qty: 10 }, + { count: 3, qty: 15 } + ]; + +#. Invoke the ``reduceFunction2`` with ``myTestObjects``: + + .. code-block:: javascript + + reduceFunction2('myKey', myTestObjects); + +#. Verify the ``reduceFunction2`` returned a document with exactly the + ``count`` and the ``qty`` field: + + .. code-block:: javascript + + { "count" : 6, "qty" : 30 } + +Test Insensitivity to the Order of the Values Elements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``reduce`` function takes a ``key`` and a ``values`` array as its +argument. You can test that the result of the ``reduce`` function does +not depend on the order of the elements in the ``values`` array. + +#. Define a sample ``values1`` array and a sample ``values2`` array + that only differ in the order of the array elements: + + .. code-block:: javascript + + var values1 = [ + { count: 1, qty: 5 }, + { count: 2, qty: 10 }, + { count: 3, qty: 15 } + ]; + + var values2 = [ + { count: 3, qty: 15 }, + { count: 1, qty: 5 }, + { count: 2, qty: 10 } + ]; + +#. Define a ``reduceFunction2`` function that takes the arguments + ``keySKU`` and ``valuesCountObjects``. ``valuesCountObjects`` is an array of + documents that contain two fields ``count`` and ``qty``: + + .. code-block:: javascript + + var reduceFunction2 = function(keySKU, valuesCountObjects) { + reducedValue = { count: 0, qty: 0 }; + + for (var idx = 0; idx < valuesCountObjects.length; idx++) { + reducedValue.count += valuesCountObjects[idx].count; + reducedValue.qty += valuesCountObjects[idx].qty; + } + + return reducedValue; + }; + +#. Invoke the ``reduceFunction2`` first with ``values1`` and then with + ``values2``: + + .. code-block:: javascript + + reduceFunction2('myKey', values1); + reduceFunction2('myKey', values2); + +#. Verify the ``reduceFunction2`` returned the same result: + + .. code-block:: javascript + + { "count" : 6, "qty" : 30 } + +Test Idempotence +~~~~~~~~~~~~~~~~ + +Because the ``reduce`` function may be called multiple times for the +same key, the ``reduce`` function returns a value that is the same type +as the value emitted from the ``map`` function. You can test that the +``reduce`` function may be invoked on "reduced" values and the result +should equal the same as if all the values were passed in one call. + +#. Define a ``reduceFunction2`` function that takes the arguments + ``keySKU`` and ``valuesCountObjects``. ``valuesCountObjects`` is an array of + documents that contain two fields ``count`` and ``qty``: + + .. code-block:: javascript + + var reduceFunction2 = function(keySKU, valuesCountObjects) { + reducedValue = { count: 0, qty: 0 }; + + for (var idx = 0; idx < valuesCountObjects.length; idx++) { + reducedValue.count += valuesCountObjects[idx].count; + reducedValue.qty += valuesCountObjects[idx].qty; + } + + return reducedValue; + }; + +#. Define a sample key: + + .. code-block:: javascript + + var myKey = 'myKey'; + +#. Define a sample ``valuesIdempotent`` array that contains an element that is a + call to the ``reduceFunction2`` function: + + .. code-block:: javascript + + var valuesIdempotent = [ + { count: 1, qty: 5 }, + { count: 2, qty: 10 }, + reduceFunction2(myKey, [ { count:3, qty: 15 } ] ) + ]; + +#. Define a sample ``values1`` array that combines the values passed to +``reduceFunction2``: + + .. code-block:: javascript + + var values1 = [ + { count: 1, qty: 5 }, + { count: 2, qty: 10 }, + { count: 3, qty: 15 } + ]; + +#. Invoke the ``reduceFunction2`` first with ``myKey`` and +``valuesIdempotent`` and then with ``myKey`` and ``values1``: + + .. code-block:: javascript + + reduceFunction2(myKey, valuesIdempotent); + reduceFunction2(myKey, values1); + +#. Verify the ``reduceFunction2`` returned the same result: + + .. code-block:: javascript + + { "count" : 6, "qty" : 30 } From 3f3317b8f3628570dcf45c72b4ab4f5ea1dbfd43 Mon Sep 17 00:00:00 2001 From: kay Date: Thu, 29 Nov 2012 16:27:07 -0500 Subject: [PATCH 3/3] DOCS-686 mapreduce --- source/applications.txt | 1 + source/applications/map-reduce.txt | 7 +-- source/includes/examples-map-reduce.rst | 45 +++------------- source/reference/command/mapReduce.txt | 53 ++++++++++++------- .../method/db.collection.mapReduce.txt | 29 +++++----- source/tutorial.txt | 1 + source/tutorial/troubleshoot-map-reduce.txt | 9 ++-- 7 files changed, 62 insertions(+), 83 deletions(-) diff --git a/source/applications.txt b/source/applications.txt index 3e24af2abb6..e075f0d1f71 100644 --- a/source/applications.txt +++ b/source/applications.txt @@ -51,5 +51,6 @@ The following documents provide patterns for developing application features: .. toctree:: :maxdepth: 2 + tutorial/troubleshoot-map-reduce tutorial/perform-two-phase-commits tutorial/expire-data diff --git a/source/applications/map-reduce.txt b/source/applications/map-reduce.txt index 3ad95921f45..64401d95476 100644 --- a/source/applications/map-reduce.txt +++ b/source/applications/map-reduce.txt @@ -46,12 +46,7 @@ For more information on the parameters, see the :method:`db.collection.mapReduce()` reference page . .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-examples-begin - :end-before: map-reduce-sum-price-wrapper-end - -.. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-sum-price-cmd-end - :end-before: map-reduce-item-counts-avg-wrapper-end + :start-after: map-reduce-document-prototype-begin .. _map-reduce-incremental: diff --git a/source/includes/examples-map-reduce.rst b/source/includes/examples-map-reduce.rst index a0a4c2a645a..b8eeb44cf7f 100644 --- a/source/includes/examples-map-reduce.rst +++ b/source/includes/examples-map-reduce.rst @@ -1,7 +1,8 @@ Map-Reduce Examples ------------------- -.. map-reduce-examples-begin +.. map-reduce-document-examples-begin +.. map-reduce-document-prototype-begin Consider the following map-reduce operations on a collection ``orders`` that contains documents of the following prototype: @@ -19,11 +20,11 @@ that contains documents of the following prototype: } .. map-reduce-document-prototype-end - + Sum the Price Per Customer Id ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. map-reduce-sum-price +.. map-reduce-sum-price-begin Perform map-reduce operation on the ``orders`` collection to group by the ``cust_id``, and for each ``cust_id``, calculate the sum of the @@ -71,7 +72,6 @@ the ``cust_id``, and for each ``cust_id``, calculate the sum of the operation will replace the contents with the results of this map-reduce operation: - .. map-reduce-sum-price-wrapper-begin .. code-block:: javascript db.orders.mapReduce( @@ -80,24 +80,12 @@ the ``cust_id``, and for each ``cust_id``, calculate the sum of the { out: "map_reduce_example" } ) - .. map-reduce-sum-price-wrapper-end - .. map-reduce-sum-price-cmd-begin - .. code-block:: javascript - - db.runCommand( - { - mapreduce: 'orders', - map: mapFunction1, - reduce: reduceFunction1, - out: 'map_reduce_example' - } - ) - .. map-reduce-sum-price-cmd-end +.. map-reduce-sum-price-end Calculate the Number of Orders, Total Quantity, and Average Quantity Per Item ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. map-reduce-item-counts +.. map-reduce-counts-begin Perform map-reduce operation on the ``orders`` collection to group by the item sku, and for each sku, calculate the number of orders and the @@ -178,8 +166,6 @@ greater than ``01/01/2012`` for the map-reduce: already exists, the operation will merge the existing contents with the results of this map-reduce operation: - .. map-reduce-item-counts-avg-end - .. map-reduce-item-counts-avg-wrapper-begin .. code-block:: javascript db.orders.mapReduce( mapFunction2, @@ -191,21 +177,4 @@ greater than ``01/01/2012`` for the map-reduce: } ) - .. map-reduce-item-counts-avg-wrapper-end - .. map-reduce-item-counts-avg-cmd-begin - .. code-block:: javascript - - db.runCommand( - { - mapreduce: 'orders', - map: mapFunction2, - reduce: reduceFunction2, - finalize: finalizeFunction2, - out: { merge: "map_reduce_example" }, - query: { ord_date: { $gt: new Date('01/01/2012') } }, - } - ) - - .. map-reduce-item-counts-avg-cmd-end - -.. map-reduce-examples-end +.. map-reduce-counts-end diff --git a/source/reference/command/mapReduce.txt b/source/reference/command/mapReduce.txt index 921754a72ee..6622cfdd8af 100644 --- a/source/reference/command/mapReduce.txt +++ b/source/reference/command/mapReduce.txt @@ -30,35 +30,50 @@ mapReduce In addition to specifying the ```` over which to perform the ``mapreduce`` command, the command accepts the following: - + .. include:: /includes/parameters-map-reduce.rst + A sample map-reduce operation using the :dbcommand:`mapReduce` + command may have the following prototype: + + .. code-block:: javascript + + var mapFunction = function() { ... }; + var reduceFunction = function(key, values) { ... }; + + db.runCommand( + { + mapreduce: 'orders', + map: mapFunction, + reduce: reduceFunction, + out: { merge: 'map_reduce_results' }, + query: { ord_date: { $gt: new Date('01/01/2012') } } + } + ) + + In the :program:`mongo`, the :method:`db.collection.mapReduce()` + method is a wrapper around the :dbcommand:`mapReduce` command. The + following examples use the :method:`db.collection.mapReduce()`: + .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-examples-begin + :start-after: map-reduce-document-prototype-begin :end-before: map-reduce-document-prototype-end - .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-sum-price - :end-before: map-reduce-sum-price-wrapper-begin + :start-after: map-reduce-sum-price-begin + :end-before: map-reduce-sum-price-end - .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-sum-price-cmd-begin - :end-before: map-reduce-sum-price-cmd-end - - .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-item-counts - :end-before: map-reduce-item-counts-avg-end + :start-after: map-reduce-counts-begin + :end-before: map-reduce-counts-end + + For more information and examples, see the :doc:`Map-Reduce + ` page. - .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-item-counts-avg-cmd-begin - :end-before: map-reduce-item-counts-avg-cmd-end + .. seealso:: - .. seealso:: :method:`mapReduce()` and :term:`map-reduce`. + - :term:`map-reduce` and :method:`db.collection.mapReduce()` - The :doc:`Map-Reduce ` - provides a greater overview of MongoDB's map-reduce - functionality. Consider the :wiki:`Simple application - ` support for basic aggregation operations as well - as :doc:`/applications/aggregation`. + - :doc:`/applications/aggregation` .. slave-ok diff --git a/source/reference/method/db.collection.mapReduce.txt b/source/reference/method/db.collection.mapReduce.txt index 25d0c41dfff..b5719038d7c 100644 --- a/source/reference/method/db.collection.mapReduce.txt +++ b/source/reference/method/db.collection.mapReduce.txt @@ -30,28 +30,23 @@ db.collection.mapReduce() .. include:: /includes/parameters-map-reduce.rst - .. mapReduce-syntax-end - .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-examples-begin + :start-after: map-reduce-document-prototype-begin :end-before: map-reduce-document-prototype-end - .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-sum-price - :end-before: map-reduce-sum-price-wrapper-end + :start-after: map-reduce-sum-price-begin + :end-before: map-reduce-sum-price-end - .. include:: /includes/examples-map-reduce.rst - :start-after: map-reduce-item-counts - :end-before: map-reduce-item-counts-avg-wrapper-end - - .. seealso:: :term:`map-reduce` and :dbcommand:`mapReduce` + :start-after: map-reduce-counts-begin + :end-before: map-reduce-counts-end - The :doc:`Map-Reduce ` page - provides a greater overview of MongoDB's map-reduce - functionality, while the :doc:`/applications/aggregation` - provides an overview of the aggregation framework. + For more information and examples, see the :doc:`Map-Reduce + ` page. - .. Consider - .. STUB ":doc:`/applications/simple-aggregation` for simple aggregation - .. operations and :doc:`/applications/aggregation`" for a more flexible - .. approach to data aggregation in MongoDB. + .. seealso:: + + - :term:`map-reduce` and :dbcommand:`mapReduce` + + - :doc:`/applications/aggregation` diff --git a/source/tutorial.txt b/source/tutorial.txt index 6a32a29bb27..6e60ae7939b 100644 --- a/source/tutorial.txt +++ b/source/tutorial.txt @@ -51,6 +51,7 @@ Application Development .. toctree:: :maxdepth: 1 + tutorial/troubleshoot-map-reduce tutorial/write-a-tumblelog-application-with-django-mongodb-engine tutorial/write-a-tumblelog-application-with-flask-mongoengine diff --git a/source/tutorial/troubleshoot-map-reduce.txt b/source/tutorial/troubleshoot-map-reduce.txt index 21b8fa6cea3..16941add01a 100644 --- a/source/tutorial/troubleshoot-map-reduce.txt +++ b/source/tutorial/troubleshoot-map-reduce.txt @@ -4,8 +4,11 @@ Troubleshoot Map-Reduce .. default-domain:: mongodb +The :doc:`/applications/map-reduce` operation requires both the ``map`` +function and the ``reduce`` function. + You can troubleshoot the ``map`` function and the ``reduce`` function -in the :program:`mongo` shell. +in the :program:`mongo` shell. .. _troubleshoot-map-function: @@ -258,7 +261,7 @@ should equal the same as if all the values were passed in one call. ]; #. Define a sample ``values1`` array that combines the values passed to -``reduceFunction2``: + ``reduceFunction2``: .. code-block:: javascript @@ -269,7 +272,7 @@ should equal the same as if all the values were passed in one call. ]; #. Invoke the ``reduceFunction2`` first with ``myKey`` and -``valuesIdempotent`` and then with ``myKey`` and ``values1``: + ``valuesIdempotent`` and then with ``myKey`` and ``values1``: .. code-block:: javascript