Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
531 changes: 62 additions & 469 deletions source/core/create.txt

Large diffs are not rendered by default.

573 changes: 81 additions & 492 deletions source/core/document.txt

Large diffs are not rendered by default.

88 changes: 88 additions & 0 deletions source/core/read-operations-architecture-considerations.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
.. index:: read operation; architecture
.. _read-operations-architecture:

============
Architecture
============

.. default-domain:: mongodb

.. index:: read operation; connection pooling
.. index:: connection pooling; read operations
.. _read-operations-connection-pooling:

.. might make sense to break this in half and move it to
replication/sharding and cross reference here?

Read Operations from Sharded Clusters
-------------------------------------

:term:`Sharded clusters <sharded cluster>` allow you to partition a
data set among a cluster of :program:`mongod` in a way that is nearly
transparent to the application. See the :doc:`/sharding` section of
this manual for additional information about these deployments.

For a sharded cluster, you issue all operations to one of the
:program:`mongos` instances associated with the
cluster. :program:`mongos` instances route operations to the
:program:`mongod` in the cluster and behave like :program:`mongod`
instances to the application. Read operations to a sharded collection
in a sharded cluster are largely the same as operations to a :term:`replica
set` or :term:`standalone` instances. See the section on :ref:`Read
Operations in Sharded Clusters <sharding-read-operations>` for more
information.

In sharded deployments, the :program:`mongos` instance routes
the queries from the clients to the :program:`mongod` instances that
hold the data, using the cluster metadata stored in the :ref:`config
database <sharding-config-server>`.

For sharded collections, if queries do not include the :ref:`shard key
<sharding-shard-key>`, the :program:`mongos` must direct the query to
all shards in a collection. These *scatter gather* queries can be
inefficient, particularly on larger clusters, and are unfeasible for
routine operations.

For more information on read operations in sharded clusters, consider
the following resources:

- :ref:`An Introduction to Shard Keys <sharding-shard-key>`
- :ref:`Shard Key Internals and Operations <sharding-internals-shard-keys>`
- :ref:`Querying Sharded Clusters <sharding-internals-querying>`
- :doc:`/core/sharded-cluster-query-router`

Read Operations from Replica Sets
---------------------------------

:term:`Replica sets <replica set>` use :term:`read preferences <read
preference>` to determine where and how to route read operations to
members of the replica set. By default, MongoDB always reads data from
a replica set's :term:`primary`. You can modify that behavior by
changing the :ref:`read preference mode
<replica-set-read-preference-modes>`.

You can configure the :ref:`read preference mode
<replica-set-read-preference-modes>` on a per-connection or
per-operation basis to allow reads from :term:`secondaries
<secondary>` to:

- reduce latency in multi-data-center deployments,

- improve read throughput by distributing high read-volumes (relative
to write volume),

- for backup operations, and/or

- to allow reads during :ref:`failover <replica-set-failover>`
situations.

Read operations from secondary members of replica sets are not
guaranteed to reflect the current state of the primary, and the state
of secondaries will trail the primary by some amount of time. Often,
applications don't rely on this kind of strict consistency, but
application developers should always consider the needs of their
application before setting read preference.

For more information on read preference or on the read preference
modes, see :doc:`/core/read-preference` and
:ref:`replica-set-read-preference-modes`.
261 changes: 261 additions & 0 deletions source/core/read-operations-cursors.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
.. _read-operations-cursors:

=======
Cursors
=======

.. default-domain:: mongodb

.. TODO restructure introduction and remove most of the list.

The :method:`find() <db.collection.find()>` method returns a
:term:`cursor` to the results; however, in the :program:`mongo` shell,
if the returned cursor is not assigned to a variable, then the cursor
is automatically iterated up to 20 times [#set-shell-batch-size]_ to print
up to the first 20 documents that match the query, as in the following
example:

.. code-block:: javascript

db.inventory.find( { type: 'food' } );

When you assign the :method:`find() <db.collection.find()>` to a
variable:

- you can call the cursor variable in the shell to iterate up to 20
times [#set-shell-batch-size]_ and print the matching documents, as in
the following example:

.. code-block:: javascript

var myCursor = db.inventory.find( { type: 'food' } );

myCursor

- you can use the cursor method :method:`next() <cursor.next()>` to
access the documents, as in the following example:

.. code-block:: javascript

var myCursor = db.inventory.find( { type: 'food' } );
var myDocument = myCursor.hasNext() ? myCursor.next() : null;

if (myDocument) {
var myItem = myDocument.item;
print(tojson(myItem));
}

As an alternative print operation, consider the ``printjson()``
helper method to replace ``print(tojson())``:

.. code-block:: javascript

if (myDocument) {
var myItem = myDocument.item;
printjson(myItem);
}

- you can use the cursor method :method:`forEach() <cursor.forEach()>`
to iterate the cursor and access the documents, as in the following
example:

.. code-block:: javascript

var myCursor = db.inventory.find( { type: 'food' } );

myCursor.forEach(printjson);

See :ref:`JavaScript cursor methods <js-query-cursor-methods>` and your
:doc:`driver </applications/drivers>` documentation for more
information on cursor methods.

.. [#set-shell-batch-size] You can use the ``DBQuery.shellBatchSize`` to
change the number of iteration from the default value ``20``. See
:ref:`mongo-shell-executing-queries` for more information.

Iterator Index
--------------

In the :program:`mongo` shell, you can use the
:method:`~cursor.toArray()` method to iterate the cursor and return
the documents in an array, as in the following:

.. code-block:: javascript

var myCursor = db.inventory.find( { type: 'food' } );
var documentArray = myCursor.toArray();
var myDocument = documentArray[3];

The :method:`~cursor.toArray()` method loads into RAM all
documents returned by the cursor; the :method:`~cursor.toArray()`
method exhausts the cursor.

Additionally, some :doc:`drivers </applications/drivers>` provide
access to the documents by using an index on the cursor (i.e.
``cursor[index]``). This is a shortcut for first calling the
:method:`~cursor.toArray()` method and then using an index
on the resulting array.

Consider the following example:

.. code-block:: javascript

var myCursor = db.inventory.find( { type: 'food' } );
var myDocument = myCursor[3];

The ``myCursor[3]`` is equivalent to the following example:

.. code-block:: javascript

myCursor.toArray() [3];

.. _cursor-behaviors:

Cursor Behaviors
----------------

Consider the following behaviors related to cursors:

- By default, the server will automatically close the cursor after 10
minutes of inactivity or if client has exhausted the cursor. To
override this behavior, you can specify the ``noTimeout``
:meta-driver:`wire protocol flag </legacy/mongodb-wire-protocol>` in
your query; however, you should either close the cursor manually or
exhaust the cursor. In the :program:`mongo` shell, you can set the
``noTimeout`` flag:

.. code-block:: javascript

var myCursor = db.inventory.find().addOption(DBQuery.Option.noTimeout);

See your :doc:`driver </applications/drivers>` documentation for
information on setting the ``noTimeout`` flag. See
:ref:`cursor-flags` for a complete list of available cursor flags.

- Because the cursor is not isolated during its lifetime, intervening
write operations may result in a cursor that returns a single
document [#single-document-def]_ more than once. To handle this
situation, see the information on :ref:`snapshot mode
<faq-developers-isolate-cursors>`.

- The MongoDB server returns the query results in batches:

- For most queries, the *first* batch returns 101 documents or just
enough documents to exceed 1 megabyte. Subsequent batch size is 4
megabytes. To override the default size of the batch, see
:method:`~cursor.batchSize()` and :method:`~cursor.limit()`.

- For queries that include a sort operation *without* an index, the
server must load all the documents in memory to perform the sort
and will return all documents in the first batch.

- Batch size will not exceed the :ref:`maximum BSON document size
<limit-bson-document-size>`.

- As you iterate through the cursor and reach the end of the returned
batch, if there are more results, :method:`cursor.next()` will
perform a :data:`getmore operation <currentOp.op>` to retrieve the
next batch.

To see how many documents remain in the batch as you iterate the
cursor, you can use the :method:`~cursor.objsLeftInBatch()` method,
as in the following example:

.. code-block:: javascript

var myCursor = db.inventory.find();

var myFirstDocument = myCursor.hasNext() ? myCursor.next() : null;

myCursor.objsLeftInBatch();

- You can use the command :dbcommand:`cursorInfo` to retrieve the
following information on cursors:

- total number of open cursors

- size of the client cursors in current use

- number of timed out cursors since the last server restart

Consider the following example:

.. code-block:: javascript

db.runCommand( { cursorInfo: 1 } )

The result from the command returns the following document:

.. code-block:: javascript

{
"totalOpen" : <number>,
"clientCursors_size" : <number>,
"timedOut" : <number>,
"ok" : 1
}

.. [#single-document-def] A single document relative to value of the
``_id`` field. A cursor cannot return the same document more than
once *if* the document has not changed.

.. _cursor-flags:

Cursor Flags
------------

The :program:`mongo` shell provides the following cursor flags:

- ``DBQuery.Option.tailable``
- ``DBQuery.Option.slaveOk``
- ``DBQuery.Option.oplogReplay``
- ``DBQuery.Option.noTimeout``
- ``DBQuery.Option.awaitData``
- ``DBQuery.Option.exhaust``
- ``DBQuery.Option.partial``

.. _read-operations-aggregation:

Aggregation
-----------

.. versionchanged:: 2.2

MongoDB can perform some basic data aggregation operations on results
before returning data to the application. These operations are not
queries; they use :term:`database commands <database command>` rather
than queries, and they do not return a cursor. However, they still
require MongoDB to read data.

Running aggregation operations on the database side can be more
efficient than running them in the application layer and can reduce
the amount of data MongoDB needs to send to the application. These
aggregation operations include basic grouping, counting, and even
processing data using a map reduce framework. Additionally, in 2.2
MongoDB provides a complete aggregation framework for more rich
aggregation operations.

The aggregation framework provides users with a "pipeline" like
framework: documents enter from a collection and then pass through a
series of steps by a sequence of :ref:`pipeline operators
<aggregation-pipeline-operator-reference>` that manipulate and
transform the documents until they're output at the end. The
aggregation framework is accessible via the :dbcommand:`aggregate`
command or the :method:`db.collection.aggregate()` helper in the
:program:`mongo` shell.

For more information on the aggregation framework see
:doc:`/aggregation`.

Additionally, MongoDB provides a number of simple data aggregation
operations for more basic data aggregation operations:

- :dbcommand:`count` (:method:`~cursor.count()`)

- :dbcommand:`distinct` (:method:`db.collection.distinct()`)

- :dbcommand:`group` (:method:`db.collection.group()`)

- :dbcommand:`mapReduce`. (Also consider
:method:`~db.collection.mapReduce()` and
:doc:`/core/map-reduce`.)
Loading