From bdc1baf6db8a36ae099e04faa4b097a7c2d253db Mon Sep 17 00:00:00 2001 From: Kristina Date: Mon, 16 Jul 2012 15:49:15 -0400 Subject: [PATCH] Index comments --- draft/administration/indexes.txt | 78 +++++++++++++++++++++++++++++--- draft/applications/indexes.txt | 41 ++++++++++++++++- 2 files changed, 111 insertions(+), 8 deletions(-) diff --git a/draft/administration/indexes.txt b/draft/administration/indexes.txt index 025a1d57a44..dc49536795a 100644 --- a/draft/administration/indexes.txt +++ b/draft/administration/indexes.txt @@ -40,6 +40,8 @@ of the ``people`` collection: db.people.ensureIndex( { phone-number: 1 } ) +TODO: you need ""s around phone-number, otherwise it's invalid JS (phone minus number). + To create a :ref:`compound index `, use an operation that resembles the following prototype: @@ -60,6 +62,8 @@ collection: To build indexes for a :term:`replica set`, before version 2.2, see :ref:`index-building-replica-sets`. +TODO: I don't think anything changed about replica set index builds for 2.2... + .. [#ensure] As the name suggests, :func:`ensureIndex() ` only creates an index if an index of the same specification does not already exist. @@ -67,6 +71,8 @@ collection: Sparse `````` +TODO: Sparse? Maybe "Types of Indexes->Sparse"? + To create a :ref:`sparse index ` on a field, use an operation that resembles the following prototype: @@ -87,6 +93,12 @@ without the ``twitter_name`` field. MongoDB cannot create sparse compound indexes. +TODO: is this true? I thought that it could. + +TODO: Is there more doc on spare indexes somewhere? Seems like this is missing +some info like getting different results back when the index is used, null +counts as existing, etc. + Unique `````` @@ -105,10 +117,14 @@ records for the same legal entity: db.accounts.ensureIndex( { tax-id: 1 }, { unique: true } ) +TODO: tax-id should be in ""s. + The :ref:`_id index ` is a unique index. In some situations you may want to use the ``_id`` field for these primary data rather than using a unique index on another field. +TODO: "for these primary data"? + In many situations you will want to combine the ``unique`` constraint with the ``sparse`` option. When MongoDB indexes a field, if a document does not have a value for a field, the index entry for that @@ -141,6 +157,8 @@ as in the following example: db.accounts.dropIndex( { tax-id: 1 } ) +TODO: ""s! + This will remove the index on the ``tax-id`` field in the ``accounts`` collection. The shell provides the following document after completing the operation: @@ -203,6 +221,12 @@ for this operation. To rebuild indexes for a :term:`replica set`, before version 2.2, see :ref:`index-rebuilding-replica-sets`. +TODO: again, this probably isn't different in 2.2 + +TODO: one thing that I would appreciate you mentioning is that some drivers may +create indexes like {a : NumberLong(1)} _which is fine_ and doesn't break +anything so stop complaining about it. + Special Creation Options ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -211,6 +235,8 @@ Special Creation Options TTL collections use a special ``expire`` index option. See :doc:`/tutorial/expire-data` for more information. +TODO: Are 2d indexes getting a mention? + Background `````````` @@ -222,11 +248,25 @@ prototype invocation of :func:`db.collection.ensureIndex()`: db.collection.ensureIndex( { a: 1 }, { background: true } ) +TODO: what does it mean to build an index in the background? You might want to +mention: +* performance implications +* that this type of index build can be killed +* that this blocks the connection you sent the ensureindex on, but ops from + other connections can proceed in +* that indexes are created on the foreground on secondaries in 2.0, + which blocks replication & slave reads. In 2.2, it does not block reads (but + still blocks repl). + Drop Duplicates ``````````````` To force the creation of a :ref:`unique index ` -index, you can use the ``dropDups`` option. This will force MongoDB to +index + +TODO: " on a collection with duplicate values in the field to be indexed " + +you can use the ``dropDups`` option. This will force MongoDB to create a *unique* index by deleting documents with duplicate values when building the index. Consider the following prototype invocation of :func:`db.collection.ensureIndex()`: @@ -243,12 +283,15 @@ See the full documentation of :ref:`duplicate dropping Specifying ``{ dropDups: true }`` will delete data from your database. Use with extreme caution. +TODO: I'd say it "may" delete data from your DB, not like it's going to go all +Shermanesque on your data. + .. _index-building-replica-sets: Building Indexes on Replica Sets -------------------------------- -.. versionchanged:: 2.2 +.. versionchanged:: 2.2 Index rebuilding operations on :term:`secondary` members of :term:`replica sets ` now run as normal background index operations. Run :func:`ensureIndex() @@ -257,20 +300,30 @@ Building Indexes on Replica Sets the following operation to isolate and control the impact of indexing building operations on a set as a whole. +TODO: I think there needs to be a huge mention that this still blocks +replication, so the procedure below is recommended. + .. admonition:: For Version 1.8 and 2.0 :ref:`Background index creation operations ` became *foreground* indexing operations on :term:`secondary` members of replica sets. These foreground operations will block all replication on the - secondaries, and can impact performance of the entire set. To build + secondaries, + +TODO: and don't allow any reads to go through. + + and can impact performance of the entire set. To build indexes with minimal impact on a replica set, use the following procedure for all non-trivial index builds: #. Stop the :program:`mongod` process on one secondary. Restart the - :program:`mongod` process *without* the :option:`--replSet ` + :program:`mongod` process *without* the :option:`--replSet ` option. This instance is now in "standalone" mode. +TODO: generally we recommend running it on a different port, too, so that apps +& other servers in the set don't try to contact it. + #. Create the new index or rebuild the index on this :program:`mongod` instance. @@ -287,7 +340,7 @@ Building Indexes on Replica Sets Ensure that your :ref:`oplog` is large enough to permit the indexing or re-indexing operation to complete without falling - too far behind to catch up. See the ":ref:`replica-set-oplog-sizing`" + too far behind to catch up. See the ":ref:`replica-set-oplog-sizing`" documentation for additional information. .. note:: @@ -301,6 +354,9 @@ Building Indexes on Replica Sets For the best results, always create indexes *before* you begin inserting data into a collection. +TODO: well, sort of. That'll build the indexes fast, but make the inserts +slower. Overall, it's faster to insert data, then build indexes. + Measuring Index Use ------------------- @@ -318,7 +374,12 @@ following tools: - :func:`cursor.hint()` Append the :func:`hint() ` to any cursor (e.g. - query) with the name of an index as the argument to *force* MongoDB + query) with the name + +TODO: this isn't "the name of an index." I'd say just "with the index." The +name of an index is a string like "zipcode_1". + + of an index as the argument to *force* MongoDB to use a specific index to fulfill the query. Consider the following example: @@ -331,8 +392,13 @@ following tools: ` in conjunction with each other to compare the effectiveness of a specific index. +TODO: mention $natural to force no index usage? + - :status:`indexCounters` Use the :status:`indexCounters` data in the output of :dbcommand:`serverStatus` for insight into database-wise index utilization. + +TODO: I'd like to see this also cover how to track how far an index build has +gotten and how to kill an index build. diff --git a/draft/applications/indexes.txt b/draft/applications/indexes.txt index 8dadaed0cbf..f7ca3c60165 100644 --- a/draft/applications/indexes.txt +++ b/draft/applications/indexes.txt @@ -38,6 +38,8 @@ database. To use a covered index you must: - in the :term:`projection`, explicitly exclude the ``_id`` field from the result set, unless the index includes ``_id``. +TODO: the third point seems like part of the first point. + Use the :func:`explain() ` to test the query. If MongoDB was able to use a covered index, then the value of the ``indexOnly`` field will be ``true``. @@ -49,7 +51,12 @@ disk, and indexes are smaller than the documents they catalog. Sort Using Indexes ~~~~~~~~~~~~~~~~~~ -While the :dbcommand:`sort` database command and the :func:`sort() +While the :dbcommand:`sort` database command + +TODO: sort database command? Is "database command" being used in a different +sense here? + + and the :func:`sort() ` helper support in-memory sort operations without the use of an index, these operations are: @@ -77,6 +84,9 @@ results. For example: When using compound indexes to support sort operations, the sorted field must be the *last* field in the index. +TODO: not true! In 2.2, you can use, say, the index above for a query on +username, sort by status, too. + Store Indexes in Memory ~~~~~~~~~~~~~~~~~~~~~~~ @@ -124,6 +134,8 @@ deep understanding of: MongoDB can only use *one* index to support any given operation. +TODO: trickily put. I hope you menion $or elsewhere? + Selectivity ~~~~~~~~~~~ @@ -145,9 +157,22 @@ with fulfilling the query. these values using the index, MongoDB will only need to scan a very small number of documents to fulfill the rest of the query. +TODO: It'd be clearer to use "real" numbers in the second example, too, but I +think you'd have to re-jigger the example to do so. + To ensure optimal performance, use indexes that are maximally selective relative to your queries. +TODO: the example makes selectivity sound like the uniqueness of the index, +which isn't the whole story. Having something like {x:{$gt:3}} that matches 60% +of the collection isn't very selective, even if x has a unique index on it. + +I think it's important to emphasize that selectivity is whittling down possible +results to as small a % as possible. + +TODO: Also, might be worth mentioning that, if you cannot get selectivity low +enough, indexes will actually be slower than table scans. + Insert Throughput ~~~~~~~~~~~~~~~~~ @@ -156,7 +181,11 @@ Insert Throughput .. TODO fact check MongoDB must update all indexes associated with a collection following -every insert or update operation. Every index on a collection adds +every insert or update operation. + +TODO: or delete, too + +Every index on a collection adds some amount of overhead to these operations. In almost every case, the performance gains that indexes realize for read operations are worth the insertion penalty; however: @@ -164,12 +193,16 @@ the insertion penalty; however: - in some cases, an index to support an infrequent query may incur more insert-related costs, than saved read-time. +TODO: rm comma: "insert-related costs than saved read-time" + - in some situations, if you have many indexes on a collection with a high insert throughput and a number of very similar indexes, you may find better overall results by using a slightly less effective index on some queries if it means consolidating the total number of indexes. +TODO: do you cover what indexes overlap? + Index Size ~~~~~~~~~~ @@ -182,9 +215,13 @@ index to locate those documents, MongoDB can maintain a much smaller - all of your indexes use less space than the documents in the collection. +TODO: individually or all together? + - the indexes and a reasonable working set can fit RAM at the same time. +TODO: a reasonable working set? + .. _indexing-right-handed: Indexes do not have to fit *entirely* into RAM in all cases. If the