From bdc1baf6db8a36ae099e04faa4b097a7c2d253db Mon Sep 17 00:00:00 2001
From: Kristina <kristina@10gen.com>
Date: Mon, 16 Jul 2012 15:49:15 -0400
Subject: [PATCH] Index comments

---
 draft/administration/indexes.txt | 78 +++++++++++++++++++++++++++++---
 draft/applications/indexes.txt   | 41 ++++++++++++++++-
 2 files changed, 111 insertions(+), 8 deletions(-)

diff --git a/draft/administration/indexes.txt b/draft/administration/indexes.txt
index 025a1d57a44..dc49536795a 100644
--- a/draft/administration/indexes.txt
+++ b/draft/administration/indexes.txt
@@ -40,6 +40,8 @@ of the ``people`` collection:
 
    db.people.ensureIndex( { phone-number: 1 } )
 
+TODO: you need ""s around phone-number, otherwise it's invalid JS (phone minus number).
+
 To create a :ref:`compound index <index-type-compound>`, use an
 operation that resembles the following prototype:
 
@@ -60,6 +62,8 @@ collection:
    To build indexes for a :term:`replica set`, before version 2.2,
    see :ref:`index-building-replica-sets`.
 
+TODO: I don't think anything changed about replica set index builds for 2.2...
+
 .. [#ensure] As the name suggests, :func:`ensureIndex() <db.collection.ensureIndex()>`
    only creates an index if an index of the same specification does
    not already exist.
@@ -67,6 +71,8 @@ collection:
 Sparse
 ``````
 
+TODO: Sparse?  Maybe "Types of Indexes->Sparse"?
+
 To create a :ref:`sparse index <index-type-sparse>` on a field, use an
 operation that resembles the following prototype:
 
@@ -87,6 +93,12 @@ without the ``twitter_name`` field.
 
    MongoDB cannot create sparse compound indexes.
 
+TODO: is this true? I thought that it could.
+
+TODO: Is there more doc on spare indexes somewhere?  Seems like this is missing
+some info like getting different results back when the index is used, null
+counts as existing, etc.
+
 Unique
 ``````
 
@@ -105,10 +117,14 @@ records for the same legal entity:
 
    db.accounts.ensureIndex( { tax-id: 1 }, { unique: true } )
 
+TODO: tax-id should be in ""s.
+
 The :ref:`_id index <index-type-primary>` is a unique index. In some
 situations you may want to use the ``_id`` field for these primary
 data rather than using a unique index on another field.
 
+TODO: "for these primary data"?
+
 In many situations you will want to combine the ``unique`` constraint
 with the ``sparse`` option. When MongoDB indexes a field, if a
 document does not have a value for a field, the index entry for that
@@ -141,6 +157,8 @@ as in the following example:
 
    db.accounts.dropIndex( { tax-id: 1 } )
 
+TODO: ""s!
+
 This will remove the index on the ``tax-id`` field in the ``accounts``
 collection. The shell provides the following document after completing
 the operation:
@@ -203,6 +221,12 @@ for this operation.
    To rebuild indexes for a :term:`replica set`, before version 2.2,
    see :ref:`index-rebuilding-replica-sets`.
 
+TODO: again, this probably isn't different in 2.2
+
+TODO: one thing that I would appreciate you mentioning is that some drivers may
+create indexes like {a : NumberLong(1)} _which is fine_ and doesn't break
+anything so stop complaining about it.
+
 Special Creation Options
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -211,6 +235,8 @@ Special Creation Options
    TTL collections use a special ``expire`` index option. See
    :doc:`/tutorial/expire-data` for more information.
 
+TODO: Are 2d indexes getting a mention?
+
 Background
 ``````````
 
@@ -222,11 +248,25 @@ prototype invocation of :func:`db.collection.ensureIndex()`:
 
    db.collection.ensureIndex( { a: 1 }, { background: true } )
 
+TODO: what does it mean to build an index in the background? You might want to
+mention:
+* performance implications
+* that this type of index build can be killed
+* that this blocks the connection you sent the ensureindex on, but ops from
+  other connections can proceed in
+* that indexes are created on the foreground on secondaries in 2.0,
+  which blocks replication & slave reads.  In 2.2, it does not block reads (but
+  still blocks repl).
+
 Drop Duplicates
 ```````````````
 
 To force the creation of a :ref:`unique index <index-type-unique>`
-index, you can use the ``dropDups`` option. This will force MongoDB to
+index
+
+TODO: " on a collection with duplicate values in the field to be indexed "
+
+you can use the ``dropDups`` option. This will force MongoDB to
 create a *unique* index by deleting documents with duplicate values
 when building the index. Consider the following prototype invocation
 of :func:`db.collection.ensureIndex()`:
@@ -243,12 +283,15 @@ See the full documentation of :ref:`duplicate dropping
    Specifying ``{ dropDups: true }`` will delete data from your
    database. Use with extreme caution.
 
+TODO: I'd say it "may" delete data from your DB, not like it's going to go all
+Shermanesque on your data.
+
 .. _index-building-replica-sets:
 
 Building Indexes on Replica Sets
 --------------------------------
 
-.. versionchanged:: 2.2 
+.. versionchanged:: 2.2
    Index rebuilding operations on :term:`secondary` members of
    :term:`replica sets <replica set>` now run as normal background
    index operations. Run :func:`ensureIndex()
@@ -257,20 +300,30 @@ Building Indexes on Replica Sets
    the following operation to isolate and control the impact of
    indexing building operations on a set as a whole.
 
+TODO: I think there needs to be a huge mention that this still blocks
+replication, so the procedure below is recommended.
+
 .. admonition:: For Version 1.8 and 2.0
 
    :ref:`Background index creation operations
    <index-creation-background>` became *foreground* indexing
    operations on :term:`secondary` members of replica sets. These
    foreground operations will block all replication on the
-   secondaries, and can impact performance of the entire set. To build
+   secondaries,
+
+TODO: and don't allow any reads to go through.
+
+   and can impact performance of the entire set. To build
    indexes with minimal impact on a replica set, use the following
    procedure for all non-trivial index builds:
 
    #. Stop the :program:`mongod` process on one secondary. Restart the
-      :program:`mongod` process *without* the :option:`--replSet <mongod --replSet>` 
+      :program:`mongod` process *without* the :option:`--replSet <mongod --replSet>`
       option. This instance is now in "standalone" mode.
 
+TODO: generally we recommend running it on a different port, too, so that apps
+& other servers in the set don't try to contact it.
+
    #. Create the new index or rebuild the index on this :program:`mongod`
       instance.
 
@@ -287,7 +340,7 @@ Building Indexes on Replica Sets
 
       Ensure that your :ref:`oplog` is large enough to permit the
       indexing or re-indexing operation to complete without falling
-      too far behind to catch up. See the ":ref:`replica-set-oplog-sizing`" 
+      too far behind to catch up. See the ":ref:`replica-set-oplog-sizing`"
       documentation for additional information.
 
    .. note::
@@ -301,6 +354,9 @@ Building Indexes on Replica Sets
 For the best results, always create indexes *before* you begin
 inserting data into a collection.
 
+TODO: well, sort of.  That'll build the indexes fast, but make the inserts
+slower.  Overall, it's faster to insert data, then build indexes.
+
 Measuring Index Use
 -------------------
 
@@ -318,7 +374,12 @@ following tools:
 - :func:`cursor.hint()`
 
   Append the :func:`hint() <cursor.hint()>` to any cursor (e.g.
-  query) with the name of an index as the argument to *force* MongoDB
+  query) with the name
+
+TODO: this isn't "the name of an index."  I'd say just "with the index."  The
+name of an index is a string like "zipcode_1".
+
+  of an index as the argument to *force* MongoDB
   to use a specific index to fulfill the query. Consider the following
   example:
 
@@ -331,8 +392,13 @@ following tools:
   <cursor.explain()>` in conjunction with each other to compare the
   effectiveness of a specific index.
 
+TODO: mention $natural to force no index usage?
+
 - :status:`indexCounters`
 
   Use the :status:`indexCounters` data in the output of
   :dbcommand:`serverStatus` for insight into database-wise index
   utilization.
+
+TODO: I'd like to see this also cover how to track how far an index build has
+gotten and how to kill an index build.
diff --git a/draft/applications/indexes.txt b/draft/applications/indexes.txt
index 8dadaed0cbf..f7ca3c60165 100644
--- a/draft/applications/indexes.txt
+++ b/draft/applications/indexes.txt
@@ -38,6 +38,8 @@ database. To use a covered index you must:
 - in the :term:`projection`, explicitly exclude the ``_id`` field from
   the result set, unless the index includes ``_id``.
 
+TODO: the third point seems like part of the first point.
+
 Use the :func:`explain() <cursor.explain()>` to test the query. If
 MongoDB was able to use a covered index, then the value of the
 ``indexOnly`` field will be ``true``.
@@ -49,7 +51,12 @@ disk, and indexes are smaller than the documents they catalog.
 Sort Using Indexes
 ~~~~~~~~~~~~~~~~~~
 
-While the :dbcommand:`sort` database command and the :func:`sort()
+While the :dbcommand:`sort` database command
+
+TODO: sort database command?  Is "database command" being used in a different
+sense here?
+
+ and the :func:`sort()
 <cursor.sort()>` helper support in-memory sort operations without the
 use of an index, these operations are:
 
@@ -77,6 +84,9 @@ results. For example:
 When using compound indexes to support sort operations, the sorted
 field must be the *last* field in the index.
 
+TODO: not true!  In 2.2, you can use, say, the index above for a query on
+username, sort by status, too.
+
 Store Indexes in Memory
 ~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -124,6 +134,8 @@ deep understanding of:
 
 MongoDB can only use *one* index to support any given operation.
 
+TODO: trickily put.  I hope you menion $or elsewhere?
+
 Selectivity
 ~~~~~~~~~~~
 
@@ -145,9 +157,22 @@ with fulfilling the query.
    these values using the index, MongoDB will only need to scan a very
    small number of documents to fulfill the rest of the query.
 
+TODO: It'd be clearer to use "real" numbers in the second example, too, but I
+think you'd have to re-jigger the example to do so.
+
 To ensure optimal performance, use indexes that are maximally
 selective relative to your queries.
 
+TODO: the example makes selectivity sound like the uniqueness of the index,
+which isn't the whole story.  Having something like {x:{$gt:3}} that matches 60%
+of the collection isn't very selective, even if x has a unique index on it.
+
+I think it's important to emphasize that selectivity is whittling down possible
+results to as small a % as possible.
+
+TODO: Also, might be worth mentioning that, if you cannot get selectivity low
+enough, indexes will actually be slower than table scans.
+
 Insert Throughput
 ~~~~~~~~~~~~~~~~~
 
@@ -156,7 +181,11 @@ Insert Throughput
 .. TODO fact check
 
 MongoDB must update all indexes associated with a collection following
-every insert or update operation. Every index on a collection adds
+every insert or update operation.
+
+TODO: or delete, too
+
+Every index on a collection adds
 some amount of overhead to these operations. In almost every case, the
 performance gains that indexes realize for read operations are worth
 the insertion penalty; however:
@@ -164,12 +193,16 @@ the insertion penalty; however:
 - in some cases, an index to support an infrequent query may incur
   more insert-related costs, than saved read-time.
 
+TODO: rm comma: "insert-related costs than saved read-time"
+
 - in some situations, if you have many indexes on a collection with a
   high insert throughput and a number of very similar indexes, you may
   find better overall results by using a slightly less effective index
   on some queries if it means consolidating the total number of
   indexes.
 
+TODO: do you cover what indexes overlap?
+
 Index Size
 ~~~~~~~~~~
 
@@ -182,9 +215,13 @@ index to locate those documents, MongoDB can maintain a much smaller
 - all of your indexes use less space than the documents in the
   collection.
 
+TODO: individually or all together?
+
 - the indexes and a reasonable working set can fit RAM at the same
   time.
 
+TODO: a reasonable working set?
+
 .. _indexing-right-handed:
 
 Indexes do not have to fit *entirely* into RAM in all cases. If the