From 2eac86c12b1c53852bd40fc451e2429802ebdf33 Mon Sep 17 00:00:00 2001 From: Sam Kleinman Date: Wed, 18 Jul 2012 20:32:35 -0400 Subject: [PATCH 1/3] DOCS-330: editing and resolving todos raised by technical review --- draft/administration/indexes.txt | 214 +++++++----------- draft/applications/indexes.txt | 139 +++++++----- draft/core/geospatial-indexes.txt | 1 + draft/core/indexes.txt | 75 +++--- .../note-build-indexes-on-replica-sets.rst | 4 + 5 files changed, 216 insertions(+), 217 deletions(-) create mode 100644 source/includes/note-build-indexes-on-replica-sets.rst diff --git a/draft/administration/indexes.txt b/draft/administration/indexes.txt index dc49536795a..8cdb8c32df4 100644 --- a/draft/administration/indexes.txt +++ b/draft/administration/indexes.txt @@ -38,9 +38,7 @@ of the ``people`` collection: .. code-block:: javascript - db.people.ensureIndex( { phone-number: 1 } ) - -TODO: you need ""s around phone-number, otherwise it's invalid JS (phone minus number). + db.people.ensureIndex( { "phone-number": 1 } ) To create a :ref:`compound index `, use an operation that resembles the following prototype: @@ -57,21 +55,18 @@ collection: db.products.ensureIndex( { item: 1, category: 1, price: 1 } ) -.. note:: - - To build indexes for a :term:`replica set`, before version 2.2, - see :ref:`index-building-replica-sets`. +Some drivers may specify indexes, using ``NumberLong(1)`` rather than +``1`` as the specification. This does not have any affect on the +resulting index. -TODO: I don't think anything changed about replica set index builds for 2.2... +.. include:: /includes/note-build-indexes-on-replica-sets.rst .. [#ensure] As the name suggests, :func:`ensureIndex() ` only creates an index if an index of the same specification does not already exist. -Sparse -`````` - -TODO: Sparse? Maybe "Types of Indexes->Sparse"? +Sparse Indexes +`````````````` To create a :ref:`sparse index ` on a field, use an operation that resembles the following prototype: @@ -91,16 +86,13 @@ without the ``twitter_name`` field. .. note:: - MongoDB cannot create sparse compound indexes. + Sparse indexes can affect the results returned by the query, + particularly with respect to sorts on fields *not* included in the + index. See the :ref:`sparse index ` section for + more information. -TODO: is this true? I thought that it could. - -TODO: Is there more doc on spare indexes somewhere? Seems like this is missing -some info like getting different results back when the index is used, null -counts as existing, etc. - -Unique -`````` +Unique Indexes +`````````````` To create a :ref:`unique indexes `, consider the following prototype: @@ -109,21 +101,17 @@ following prototype: db.collection.ensureIndex( { a: 1 }, { unique: true } ) -For example, you may want to create a unique index on the ``tax-id:`` +For example, you may want to create a unique index on the ``"tax-id":`` of the ``accounts`` collection to prevent storing multiple account records for the same legal entity: .. code-block:: javascript - db.accounts.ensureIndex( { tax-id: 1 }, { unique: true } ) - -TODO: tax-id should be in ""s. + db.accounts.ensureIndex( { "tax-id": 1 }, { unique: true } ) The :ref:`_id index ` is a unique index. In some -situations you may want to use the ``_id`` field for these primary -data rather than using a unique index on another field. - -TODO: "for these primary data"? +situations you may consider using ``_id`` field itself for this kind +of data rather than using a unique index on another field. In many situations you will want to combine the ``unique`` constraint with the ``sparse`` option. When MongoDB indexes a field, if a @@ -155,11 +143,9 @@ as in the following example: .. code-block:: javascript - db.accounts.dropIndex( { tax-id: 1 } ) + db.accounts.dropIndex( { "tax-id": 1 } ) -TODO: ""s! - -This will remove the index on the ``tax-id`` field in the ``accounts`` +This will remove the index on the ``"tax-id"`` field in the ``accounts`` collection. The shell provides the following document after completing the operation: @@ -216,16 +202,7 @@ This shell helper provides a wrapper around the :dbcommand:`reIndex` ` may have a different or additional interface for this operation. -.. note:: - - To rebuild indexes for a :term:`replica set`, before version 2.2, - see :ref:`index-rebuilding-replica-sets`. - -TODO: again, this probably isn't different in 2.2 - -TODO: one thing that I would appreciate you mentioning is that some drivers may -create indexes like {a : NumberLong(1)} _which is fine_ and doesn't break -anything so stop complaining about it. +.. include:: /includes/note-build-indexes-on-replica-sets.rst Special Creation Options ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -235,7 +212,8 @@ Special Creation Options TTL collections use a special ``expire`` index option. See :doc:`/tutorial/expire-data` for more information. -TODO: Are 2d indexes getting a mention? +.. TODO: insert link here to the geospatial index documents when + they're published. Background `````````` @@ -248,26 +226,17 @@ prototype invocation of :func:`db.collection.ensureIndex()`: db.collection.ensureIndex( { a: 1 }, { background: true } ) -TODO: what does it mean to build an index in the background? You might want to -mention: -* performance implications -* that this type of index build can be killed -* that this blocks the connection you sent the ensureindex on, but ops from - other connections can proceed in -* that indexes are created on the foreground on secondaries in 2.0, - which blocks replication & slave reads. In 2.2, it does not block reads (but - still blocks repl). +Consider the section on :ref:`background index construction +` for more information about these indexes +and their implications. Drop Duplicates ``````````````` To force the creation of a :ref:`unique index ` -index - -TODO: " on a collection with duplicate values in the field to be indexed " - -you can use the ``dropDups`` option. This will force MongoDB to -create a *unique* index by deleting documents with duplicate values +index on a collection with duplicate values in the field you are +indexing you can use the ``dropDups`` option. This will force MongoDB +to create a *unique* index by deleting documents with duplicate values when building the index. Consider the following prototype invocation of :func:`db.collection.ensureIndex()`: @@ -280,82 +249,65 @@ See the full documentation of :ref:`duplicate dropping .. warning:: - Specifying ``{ dropDups: true }`` will delete data from your + Specifying ``{ dropDups: true }`` may delete data from your database. Use with extreme caution. -TODO: I'd say it "may" delete data from your DB, not like it's going to go all -Shermanesque on your data. - .. _index-building-replica-sets: Building Indexes on Replica Sets -------------------------------- -.. versionchanged:: 2.2 - Index rebuilding operations on :term:`secondary` members of - :term:`replica sets ` now run as normal background - index operations. Run :func:`ensureIndex() - ` normally with the ``{ background: - true }`` option for replica sets. Alternatively, you may always use - the following operation to isolate and control the impact of - indexing building operations on a set as a whole. - -TODO: I think there needs to be a huge mention that this still blocks -replication, so the procedure below is recommended. - -.. admonition:: For Version 1.8 and 2.0 - - :ref:`Background index creation operations - ` became *foreground* indexing - operations on :term:`secondary` members of replica sets. These - foreground operations will block all replication on the - secondaries, +Consideration +~~~~~~~~~~~~~ -TODO: and don't allow any reads to go through. +:ref:`Background index creation operations +` became *foreground* indexing operations +on :term:`secondary` members of replica sets. These foreground +operations will block all replication on the secondaries, and don't +allow any reads. As a result in most cases use the following procedure +to build indexes on secondaries. - and can impact performance of the entire set. To build - indexes with minimal impact on a replica set, use the following - procedure for all non-trivial index builds: +Procedure +~~~~~~~~~ - #. Stop the :program:`mongod` process on one secondary. Restart the - :program:`mongod` process *without* the :option:`--replSet ` - option. This instance is now in "standalone" mode. +#. Stop the :program:`mongod` process on one secondary. Restart the + :program:`mongod` process *without* the :option:`--replSet ` + option and running on a different port. [#different-port]_ This + instance is now in "standalone" mode. -TODO: generally we recommend running it on a different port, too, so that apps -& other servers in the set don't try to contact it. +#. Create the new index or rebuild the index on this :program:`mongod` + instance. - #. Create the new index or rebuild the index on this :program:`mongod` - instance. +#. Restart the :program:`mongod` instance with the + :option:`--replSet ` option. Allow replication + to catch up on this member. - #. Restart the :program:`mongod` instance with the - :option:`--replSet ` option. Allow replication - to catch up on this member. +#. Replete this operation on all of the remaining secondaries. - #. Replete this operation on all of the remaining secondaries. +#. Run :func:`rs.stepDown()` on the :term:`primary` member of the + set, and then repeat this procedure on the former primary. - #. Run :func:`rs.stepDown()` on the :term:`primary` member of the - set, and then run this procedure on the former primary. - - .. warning:: +.. warning:: - Ensure that your :ref:`oplog` is large enough to permit the - indexing or re-indexing operation to complete without falling - too far behind to catch up. See the ":ref:`replica-set-oplog-sizing`" - documentation for additional information. + Ensure that your :ref:`oplog` is large enough to permit the + indexing or re-indexing operation to complete without falling + too far behind to catch up. See the ":ref:`replica-set-oplog-sizing`" + documentation for additional information. - .. note:: +.. note:: - This procedure *does* block indexing on one member of the - replica set at a time. However, the foreground indexing - operation is more efficient than the background index operation, - and will only affect one secondary at a time rather than *all* - secondaries at the same time. + This procedure *does* block indexing on one member of the + replica set at a time. However, the foreground indexing + operation is more efficient than the background index operation, + and will only affect one secondary at a time rather than *all* + secondaries at the same time. -For the best results, always create indexes *before* you begin -inserting data into a collection. +.. [#different-port] By running the :program:`mongod` on a different + port, you ensure that the other members of the replica set and all + clients will not contact the member while you are building the + index. -TODO: well, sort of. That'll build the indexes fast, but make the inserts -slower. Overall, it's faster to insert data, then build indexes. +.. _indexes-measuring-use: Measuring Index Use ------------------- @@ -374,12 +326,7 @@ following tools: - :func:`cursor.hint()` Append the :func:`hint() ` to any cursor (e.g. - query) with the name - -TODO: this isn't "the name of an index." I'd say just "with the index." The -name of an index is a string like "zipcode_1". - - of an index as the argument to *force* MongoDB + query) with the index as the argument to *force* MongoDB to use a specific index to fulfill the query. Consider the following example: @@ -387,12 +334,15 @@ name of an index is a string like "zipcode_1". db.people.find( { name: "John Doe", zipcode: { $gt: 63000 } } } ).hint( { zipcode: 1 } ) - You can use :func:`hint() ` and :func:`explain() ` in conjunction with each other to compare the - effectiveness of a specific index. + effectiveness of a specific index. Specify the ``$natural`` operator + to the :func:`hint() ` method to prevent MongoDB from + using *any* index: + + .. code-block:: javascript -TODO: mention $natural to force no index usage? + db.people.find( { name: "John Doe", zipcode: { $gt: 63000 } } } ).hint( { $natural: 1 } ) - :status:`indexCounters` @@ -400,5 +350,17 @@ TODO: mention $natural to force no index usage? :dbcommand:`serverStatus` for insight into database-wise index utilization. -TODO: I'd like to see this also cover how to track how far an index build has -gotten and how to kill an index build. +Monitoring and Controlling Index Building +----------------------------------------- + +.. TODO insert links to the values in the inprog array following the + completion of DOCS-162 + +To see the status of the indexing processes, you can use the +:func:`db.currentOP()` method in the :program:`mongo` shell. The value +of the ``query`` field and the ``msg`` field will indicate if the +operation is an index build. The ``msg`` field also indicates the +percent of the build that is complete. + +If you need to terminate an ongoing index build, You can use the +:func:`db.killOP()` method in the :program:`mongo` shell. diff --git a/draft/applications/indexes.txt b/draft/applications/indexes.txt index f7ca3c60165..108660782c0 100644 --- a/draft/applications/indexes.txt +++ b/draft/applications/indexes.txt @@ -21,6 +21,9 @@ applications with MongoDB. Strategies ---------- +.. _covered-queries: +.. _indexes-covered-queries: + Use Covered Queries ~~~~~~~~~~~~~~~~~~~ @@ -30,16 +33,15 @@ database. To use a covered index you must: - ensure that the index includes all of the fields in the result. + This means that the :term:`projection`, must explicitly exclude the + ``_id`` field from the result set, unless the index includes + ``_id``. + - if any of the indexed fields in any of the documents in the collection includes an array, then the index becomes a :ref:`multi-key index ` index, and cannot support a covered query. -- in the :term:`projection`, explicitly exclude the ``_id`` field from - the result set, unless the index includes ``_id``. - -TODO: the third point seems like part of the first point. - Use the :func:`explain() ` to test the query. If MongoDB was able to use a covered index, then the value of the ``indexOnly`` field will be ``true``. @@ -51,14 +53,8 @@ disk, and indexes are smaller than the documents they catalog. Sort Using Indexes ~~~~~~~~~~~~~~~~~~ -While the :dbcommand:`sort` database command - -TODO: sort database command? Is "database command" being used in a different -sense here? - - and the :func:`sort() -` helper support in-memory sort operations without the -use of an index, these operations are: +While the :func:`sort() ` method supports in-memory +sort operations without the use of an index, these operations are: #. Significantly slower than sort operations that use an index. @@ -84,8 +80,10 @@ results. For example: When using compound indexes to support sort operations, the sorted field must be the *last* field in the index. -TODO: not true! In 2.2, you can use, say, the index above for a query on -username, sort by status, too. +.. TODO: not true! In 2.2, you can use, say, the index above for a query on + username, sort by status, too. + +.. is this not true in other version? what changed? Store Indexes in Memory ~~~~~~~~~~~~~~~~~~~~~~~ @@ -132,9 +130,9 @@ deep understanding of: - which indexes the most common queries use. -MongoDB can only use *one* index to support any given operation. - -TODO: trickily put. I hope you menion $or elsewhere? +MongoDB can only use *one* index to support any given +operation. However, each clause of an :operator:`$or` query can use +its own index. Selectivity ~~~~~~~~~~~ @@ -142,48 +140,76 @@ Selectivity Selectivity describes the ability of a query to narrow the result set using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated -with fulfilling the query. +with fulfilling the query. There are two aspects of selectivity: -.. example:: +#. Data need to have a high distribution of the values for the indexed + key. - First, consider an index on a field that has three values evenly - distributed across the collection. If MongoDB uses this index for a - query, MongoDB will still need to scan a third of the - :term:`documents ` in the collection to fulfill the rest - of the query. +#. Queries need to limit the number of possible documents using the + indexed field. - Then, consider an index on a field that has many values evenly - distributed across the collection. If your query selects one of - these values using the index, MongoDB will only need to scan a very - small number of documents to fulfill the rest of the query. +.. example:: -TODO: It'd be clearer to use "real" numbers in the second example, too, but I -think you'd have to re-jigger the example to do so. + First, consider an index, ``{ a : 1 }``, on a collection where + ``a`` has three values evenly distributed across the collection: -To ensure optimal performance, use indexes that are maximally -selective relative to your queries. + .. code-block:: javascript -TODO: the example makes selectivity sound like the uniqueness of the index, -which isn't the whole story. Having something like {x:{$gt:3}} that matches 60% -of the collection isn't very selective, even if x has a unique index on it. + { _id: ObjectId(), a: 1, b: "ab" } + { _id: ObjectId(), a: 1, b: "cd" } + { _id: ObjectId(), a: 1, b: "ef" } + { _id: ObjectId(), a: 2, b: "jk" } + { _id: ObjectId(), a: 2, b: "lm" } + { _id: ObjectId(), a: 2, b: "no" } + { _id: ObjectId(), a: 3, b: "pq" } + { _id: ObjectId(), a: 3, b: "rs" } + { _id: ObjectId(), a: 3, b: "tv" } -I think it's important to emphasize that selectivity is whittling down possible -results to as small a % as possible. + If you do a query for ``{ a: 2, b: "no" }`` MongoDB will still need + to scan 3 documents of the :term:`documents ` in the + collection to fulfill the query. Similarly, a query for ``{ a: { + $gt: 1}, b: "tv" }``, would need to scan through 6 documents, + although both queries would return the same result. -TODO: Also, might be worth mentioning that, if you cannot get selectivity low -enough, indexes will actually be slower than table scans. + Then, consider an index on a field that has many values evenly + distributed across the collection: + + .. code-block:: javascript + + { _id: ObjectId(), a: 1, b: "ab" } + { _id: ObjectId(), a: 2, b: "cd" } + { _id: ObjectId(), a: 3, b: "ef" } + { _id: ObjectId(), a: 4, b: "jk" } + { _id: ObjectId(), a: 5, b: "lm" } + { _id: ObjectId(), a: 6, b: "no" } + { _id: ObjectId(), a: 7, b: "pq" } + { _id: ObjectId(), a: 8, b: "rs" } + { _id: ObjectId(), a: 9, b: "tv" } + + Although the index on ``a`` is more selective, in the sense that + queries can use the index more effectively, a query such as ``{ a: + { $gt: 5 }, b: "tv" }`` would still need to scan 4 documents. By + contrast, given a query like ``{ a: 2, b: "cd" }``, MongoDB would + only need to scan one document to fulfill the rest of the + query. The index and query are more selective because the values of + ``a`` are evenly distributed *and* the query can selects a specific + document using the index. + +To ensure optimal performance, use indexes that are maximally +selective relative to your queries. At the same time queries need to +be appropriately selective relative to your indexed data. If overall +selectivity is low enough, and MongoDB must read a number of documents +to return results, then some queries may perform faster without +indexes. See the :ref:`indexes-measuring-use` section for more +information on testing information. Insert Throughput ~~~~~~~~~~~~~~~~~ .. TODO insert link to /source/core/write-operations when that page is complete. -.. TODO fact check - -MongoDB must update all indexes associated with a collection following -every insert or update operation. - -TODO: or delete, too +MongoDB must update all indexes associated with a collection after +every insert, update, or delete operation. Every index on a collection adds some amount of overhead to these operations. In almost every case, the @@ -191,9 +217,7 @@ performance gains that indexes realize for read operations are worth the insertion penalty; however: - in some cases, an index to support an infrequent query may incur - more insert-related costs, than saved read-time. - -TODO: rm comma: "insert-related costs than saved read-time" + more insert-related costs than saved read-time. - in some situations, if you have many indexes on a collection with a high insert throughput and a number of very similar indexes, you may @@ -201,7 +225,9 @@ TODO: rm comma: "insert-related costs than saved read-time" on some queries if it means consolidating the total number of indexes. -TODO: do you cover what indexes overlap? +.. TODO: do you cover what indexes overlap? + +.. no. I'm not sure the case to which you're referring. Index Size ~~~~~~~~~~ @@ -212,15 +238,12 @@ your queries only match a subset of the documents and can use the index to locate those documents, MongoDB can maintain a much smaller :term:`working set`. Ensure that: -- all of your indexes use less space than the documents in the - collection. - -TODO: individually or all together? - -- the indexes and a reasonable working set can fit RAM at the same - time. +- the indexes and the working set can fit RAM at the same time. -TODO: a reasonable working set? +- all of your indexes use less space than all of the documents in the + collection. This may not be an issue all of your queries use + :ref:`covered queries ` or indexes do not need to + fit into ram, as in the following situation: .. _indexing-right-handed: diff --git a/draft/core/geospatial-indexes.txt b/draft/core/geospatial-indexes.txt index 824e97ce37f..cfcbf6cbfc9 100644 --- a/draft/core/geospatial-indexes.txt +++ b/draft/core/geospatial-indexes.txt @@ -189,6 +189,7 @@ or latitude. Create this index using following command: db.places.ensureIndex({ loc: "geoHaystack", type: 1} , { bucketSize: 2 } ) + .. TODO clarify what the type argument does or if it's just always required. diff --git a/draft/core/indexes.txt b/draft/core/indexes.txt index 46d017282ea..f8ad7d1d520 100644 --- a/draft/core/indexes.txt +++ b/draft/core/indexes.txt @@ -270,6 +270,7 @@ index to locate the document: db.feedback.find( { "comments.text": "Please expand the olive selection." } ) +.. include:: /includes/note-build-indexes-on-replica-sets.rst .. warning:: @@ -394,7 +395,8 @@ construction: operations can run while creating the index. However, the :program:`mongo` shell session or connection where you are creating the index will block until the index build is complete. Open another - connection or :program:`mongo` instance to continue using the database. + connection or :program:`mongo` instance to continue using commands + to the database. - The background index operation use an incremental approach that is slower than the normal "foreground" index builds. If the index is @@ -403,21 +405,25 @@ construction: .. admonition:: Building Indexes on Secondaries - .. versionchanged:: 2.1.0 - Before 2.1.0, :term:`replica sets ` cannot build - indexes in the background on :term:`secondaries `. + Background index operations on a :term:`replica set` + :term:`primary`, become foreground indexing operations on secondary + members of the set. All indexing operations on secondaries block + replication. - To rebuild large indexes on secondaries before version 2.1.0, - typically the best approach is to restart each secondary in - "standalone" mode and build the index. When the index is rebuilt, - restart as a member of the replica set, allow it to catch up with - the other members of the set, and then rebuild the index on the - next secondary. When all the secondaries have the new index, step - down the primary and build the index on the former primary. + To rebuild large indexes on secondaries the best approach is to + restart one secondary at a time in "standalone" mode and build the + index. When the index is rebuilt, restart as a member of the + replica set, allow it to catch up with the other members of the + set, and then rebuild the index on the next secondary. When all the + secondaries have the new index, step down the primary, restart it + as a standalone, and build the index on the former primary. Remember, the amount of time required to build the index on a secondary node must be within the window of the :term:`oplog`, so - that the secondary can catch up. + that the secondary can catch up with the primary. + + See :ref:`index-building-replica-sets` for more information on + this process. Indexes on secondary members in "recovering" mode are always built in the foreground to allow them to catch up as soon as possible. @@ -503,6 +509,8 @@ indexes to fulfill arbitrary queries. .. see:: :doc:`/tutorial/expire-data` +.. _index-feature-geospatial: + Geospatial Indexes ~~~~~~~~~~~~~~~~~~ @@ -515,31 +523,13 @@ are "near" a given coordinate pair. To create a geospatial index, your :term:`documents ` must have a coordinate pair. For maximum compatibility, these coordinate pairs should be in the form of a two element array, such as ``[ x , y -]``, but other representations are acceptable, including: - -.. code-block:: javascript - - { loc : [ 50 , 30 ] } - { loc : { x : 50 , y : 30 } } - { loc : { foo : 50 , y : 30 } } - { loc : { lon : 40.739037, lat: 73.992964 } } - -Given the field of ``loc`` in the collection ``places``, you would -create a geospatial index as follows: +]``. Given the field of ``loc``, that held a coordinate pair, in the +collection ``places``, you would create a geospatial index as follows: .. code-block:: javascript db.places.ensureIndex( { loc : "2d" } ) -By default, ``2d`` indexes assume that the coordinates are -latitude/longitude systems, and assume that minimum and maximum bounds -are ``[ -180, 180 ]``. You can specify a different minimum and maximum -values, as follows: - -.. code-block:: javascript - - db.places.ensureIndex( { loc : "2d" }, { min: -250 , max: 250 } ) - MongoDB will reject documents that have values in the ``loc`` field beyond the minimum and maximum values. @@ -556,7 +546,26 @@ data. .. TODO insert link to special /core/geospatial.txt documentation on this topic. once that document exists. -.. TODO short mention of geoHaystack indexes here? +Geohaystack Indexes +~~~~~~~~~~~~~~~~~~~ + +.. TODO update links in the following session as needed: + +In addition to conventional :ref:`geospatial indexes +`, MongoDB also provides a bucket-based +geospatial index, called "geospatial haystack indexes." These indexes +support high performance queries for locations within a small area, +when the query must filter along another dimension. + +.. example:: + + If you need to return all documents that have coordinates within 25 + miles of a given point *and* have a type field value of "museum," a + haystack index would be provide the best support for these queries. + +Haystack indices allow you to tune your bucket size to the +distribution of your data, so that in general you search only very +small regions of 2d space for a particular kind of document. Index Limitations ----------------- diff --git a/source/includes/note-build-indexes-on-replica-sets.rst b/source/includes/note-build-indexes-on-replica-sets.rst new file mode 100644 index 00000000000..9dc493807df --- /dev/null +++ b/source/includes/note-build-indexes-on-replica-sets.rst @@ -0,0 +1,4 @@ +.. note:: + + To rebuild indexes for a :term:`replica set` see + :ref:`index-rebuilding-replica-sets`. From 2e6a1b8368686074ff5377756239be3c6c35159e Mon Sep 17 00:00:00 2001 From: Sam Kleinman Date: Fri, 20 Jul 2012 14:59:23 -0400 Subject: [PATCH 2/3] DOCS-330 adding examples based on feedback from astaple --- draft/applications/indexes.txt | 109 ++++++++++++++++++++++++++++++--- 1 file changed, 99 insertions(+), 10 deletions(-) diff --git a/draft/applications/indexes.txt b/draft/applications/indexes.txt index 108660782c0..e2f9da43521 100644 --- a/draft/applications/indexes.txt +++ b/draft/applications/indexes.txt @@ -50,6 +50,9 @@ Covered queries are much faster than other queries, for two reasons: indexes are typically stored in RAM *or* located sequentially on disk, and indexes are smaller than the documents they catalog. +.. _index-sort: +.. _sorting-with-indexes: + Sort Using Indexes ~~~~~~~~~~~~~~~~~~ @@ -73,17 +76,55 @@ results. For example: on ":ref:`Ascending and Descending Index Order `." -- MongoDB can use a compound index ``{ status: 1, username: 1 }`` to - return a query on the ``status`` field sorted by the ``username`` - field. +- In general, MongoDB can use a compound index to return sorted + results *if*: + + - the first sorted field is first field in the index. + + - the last field in the index before the first sorted field is an + equality match in the query. + + Consider the example presented below for an illustration of this + concept. + +.. example:: + + Given the following index: + + .. code-block:: javascript + + { a: 1, b: 1, c: 1, d: 1 } + + The following query and sort operations will be able to use the + index: + + .. code-block:: javascript + + db.collection.find().sort( { a:1 } ) + db.collection.find().sort( { a:1, b:1 } ) + + db.collection.find( { a:4 } ).sort( { a:1, b:1 } ) + db.collection.find( { b:5 } ).sort( { a:1, b:1 } ) + + db.collection.find( { a:{ $gt:4 } } ).sort( { a:1, b:1 } ) + db.collection.find( { b:{ $gt:5 } } ).sort( { a:1, b:1 } ) + + db.collection.find( { a:5 } ).sort( { a:1, b:1 } ) + db.collection.find( { a:5 } ).sort( { b:1, c:1 } ) + + db.collection.find( { a:5, c:4, b:3 } ).sort( { d:1 } ) -When using compound indexes to support sort operations, the sorted -field must be the *last* field in the index. + db.collection.find( { a:5, b:3, d:{ $gt:4 } } ).sort( { c:1 } ) + db.collection.find( { a:5, b:3, c:{ $lt:2 }, d:{ $gt:4 } } ).sort( { c:1 } ) -.. TODO: not true! In 2.2, you can use, say, the index above for a query on - username, sort by status, too. + However, the following query operations would not be able to sort + data using the index: -.. is this not true in other version? what changed? + .. code-block:: javascript + + db.collection.find().sort( { b:1 } ) + db.collection.find( { b:5 } ).sort( { b:1 } ) + db.collection.find( { b:{ $gt:5 } } ).sort( { a:1, b:1 } ) Store Indexes in Memory ~~~~~~~~~~~~~~~~~~~~~~~ @@ -134,6 +175,8 @@ MongoDB can only use *one* index to support any given operation. However, each clause of an :operator:`$or` query can use its own index. +.. _index-selectivity: + Selectivity ~~~~~~~~~~~ @@ -225,9 +268,55 @@ the insertion penalty; however: on some queries if it means consolidating the total number of indexes. -.. TODO: do you cover what indexes overlap? +- If your indexes and queries are not very selective, the speed + improvements for query operations may not offset the costs of + maintaining an index. See the section on :ref:`index selectivity + ` for more information. + +- In some cases a single compound on two or more fields index may + support all of the queries that index on a single field index, or a + smaller compound index. In general, MongoDB can use compound index + to support the same queries as any of its prefixes. Consider the + following example: + + .. example:: + + Given the following index on a collection: + + .. code-block:: javascript + + { x: 1, y: 1, z: 1 } + + Can support a number of queries as well as most of the queries + that the following indexes support: + + .. code-block:: javascript + + { x: 1 } + { x: 1, y: 1 } + + There are some situations where the prefix indexes may offer + better query performance as is the case if ``z`` is a large + array. Also, consider the following index on the same collection: + + .. code-block:: javascript + + { x: 1, z: 1 } + + The ``{ x: 1, y: 1, z: 1 }`` index can support many of the same + queries as the above index; however, ``{ x: 1, z: 1 }`` has + additional use: Given the following query: + + .. code-block:: javascript + + db.collection.find( { x: 5 } ).sort( { z: 1} ) + + The ``{ x: 1, z: 1 }`` will support both the query and the sort + operation, while the ``{ x: 1, y: 1, z: 1 }`` index can only + support the query. -.. no. I'm not sure the case to which you're referring. + See the :ref:`sorting-with-indexes` section for more + information. Index Size ~~~~~~~~~~ From 1dd4f9a4b68ece6a2026e063a86db831e386cd8f Mon Sep 17 00:00:00 2001 From: Sam Kleinman Date: Fri, 20 Jul 2012 16:03:06 -0400 Subject: [PATCH 3/3] DOCS-330 final round of comments on the indexing documents --- draft/administration/indexes.txt | 14 ++++++-------- draft/applications/indexes.txt | 2 +- draft/core/indexes.txt | 13 ++++++++----- .../note-build-indexes-on-replica-sets.rst | 2 +- 4 files changed, 16 insertions(+), 15 deletions(-) diff --git a/draft/administration/indexes.txt b/draft/administration/indexes.txt index 8cdb8c32df4..cfa9164733a 100644 --- a/draft/administration/indexes.txt +++ b/draft/administration/indexes.txt @@ -261,7 +261,7 @@ Consideration ~~~~~~~~~~~~~ :ref:`Background index creation operations -` became *foreground* indexing operations +` become *foreground* indexing operations on :term:`secondary` members of replica sets. These foreground operations will block all replication on the secondaries, and don't allow any reads. As a result in most cases use the following procedure @@ -282,7 +282,7 @@ Procedure :option:`--replSet ` option. Allow replication to catch up on this member. -#. Replete this operation on all of the remaining secondaries. +#. Repeat this operation on all of the remaining secondaries. #. Run :func:`rs.stepDown()` on the :term:`primary` member of the set, and then repeat this procedure on the former primary. @@ -296,11 +296,9 @@ Procedure .. note:: - This procedure *does* block indexing on one member of the - replica set at a time. However, the foreground indexing - operation is more efficient than the background index operation, - and will only affect one secondary at a time rather than *all* - secondaries at the same time. + This procedure *does* take one member out of the replica set at a + time. However, this procedure will only affect one member of the + set at a time rather than *all* secondaries at the same time. .. [#different-port] By running the :program:`mongod` on a different port, you ensure that the other members of the replica set and all @@ -357,7 +355,7 @@ Monitoring and Controlling Index Building completion of DOCS-162 To see the status of the indexing processes, you can use the -:func:`db.currentOP()` method in the :program:`mongo` shell. The value +:func:`db.currentOp()` method in the :program:`mongo` shell. The value of the ``query`` field and the ``msg`` field will indicate if the operation is an index build. The ``msg`` field also indicates the percent of the build that is complete. diff --git a/draft/applications/indexes.txt b/draft/applications/indexes.txt index e2f9da43521..a999343a505 100644 --- a/draft/applications/indexes.txt +++ b/draft/applications/indexes.txt @@ -118,7 +118,7 @@ results. For example: db.collection.find( { a:5, b:3, c:{ $lt:2 }, d:{ $gt:4 } } ).sort( { c:1 } ) However, the following query operations would not be able to sort - data using the index: + the results using the index: .. code-block:: javascript diff --git a/draft/core/indexes.txt b/draft/core/indexes.txt index f8ad7d1d520..a7269892ab1 100644 --- a/draft/core/indexes.txt +++ b/draft/core/indexes.txt @@ -410,11 +410,11 @@ construction: members of the set. All indexing operations on secondaries block replication. - To rebuild large indexes on secondaries the best approach is to + To build large indexes on secondaries the best approach is to restart one secondary at a time in "standalone" mode and build the - index. When the index is rebuilt, restart as a member of the + index. After building the index, restart as a member of the replica set, allow it to catch up with the other members of the - set, and then rebuild the index on the next secondary. When all the + set, and then build the index on the next secondary. When all the secondaries have the new index, step down the primary, restart it as a standalone, and build the index on the former primary. @@ -563,9 +563,12 @@ when the query must filter along another dimension. miles of a given point *and* have a type field value of "museum," a haystack index would be provide the best support for these queries. -Haystack indices allow you to tune your bucket size to the +Haystack indexes allow you to tune your bucket size to the distribution of your data, so that in general you search only very -small regions of 2d space for a particular kind of document. +small regions of 2d space for a particular kind of document. These +indexes are not suited for finding the closest documents to a +particular location, when the closest documents are far away compared +to bucket size. Index Limitations ----------------- diff --git a/source/includes/note-build-indexes-on-replica-sets.rst b/source/includes/note-build-indexes-on-replica-sets.rst index 9dc493807df..b7bc5f8431c 100644 --- a/source/includes/note-build-indexes-on-replica-sets.rst +++ b/source/includes/note-build-indexes-on-replica-sets.rst @@ -1,4 +1,4 @@ .. note:: - To rebuild indexes for a :term:`replica set` see + To build or rebuild indexes for a :term:`replica set` see :ref:`index-rebuilding-replica-sets`.