From f4b2e20944906b8053fa6e73a0b8db23d8e79120 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 12 Sep 2012 15:18:03 -0400 Subject: [PATCH 01/10] DOCS-219 merged index advice wiki page to manual --- source/administration/indexes.txt | 22 ++-- source/applications/indexes.txt | 142 +++++++++++++++------ source/core/indexes.txt | 29 ++--- source/faq.txt | 3 +- source/faq/indexes.txt | 51 ++++++++ source/reference/method/cursor.explain.txt | 17 +++ 6 files changed, 193 insertions(+), 71 deletions(-) create mode 100644 source/faq/indexes.txt diff --git a/source/administration/indexes.txt b/source/administration/indexes.txt index 89134d21a04..01f78c07bfb 100644 --- a/source/administration/indexes.txt +++ b/source/administration/indexes.txt @@ -7,17 +7,13 @@ Indexing Operations Synopsis -------- -Indexes allow MongoDB to process and fulfill queries quickly, by -creating an small and efficient representation of the documents in the -collection. Fundamentally, indexes in MongoDB are operationally -similar to indexes in other database systems. Read the -":doc:`/core/indexes`" documentation for more information on the -fundamentals of indexing in MongoDB, and the -":doc:`/applications/indexes`" documentation for practical strategies -and examples for using indexes in your application. - -This document provides operational guidelines and procedures related -to indexing data in MongoDB collections. +This document provides operational guidelines and procedures for +indexing data in MongoDB collections. For the fundamentals of MongoDB +indexing, see the :doc:`/core/indexes` document. For strategies and +practical approaches, see the :doc:`/applications/indexes` document. + +Indexes allow MongoDB to process and fulfill queries quickly by creating +small and efficient representations of the documents in a collection. Operations ---------- @@ -334,8 +330,8 @@ following tools: Append the :method:`explain() ` method to any cursor (e.g. query) to return a document with statistics about the query - process, including the index used, and the number of documents - scanned. + process, including the index used, the number of documents scanned, + and the time the query takes to process in milliseconds. - :method:`cursor.hint()` diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index d4a13a125ab..2e01fd3374b 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -7,20 +7,52 @@ Indexing Strategies Synopsis -------- -Indexes allow MongoDB to process and fulfill queries quickly, by -creating an small and efficient representation of the documents in the -collection. Read the ":doc:`/core/indexes`" documentation for more -information on the fundamentals of indexing in MongoDB, and the -":doc:`/administration/indexes`" documentation for operational -guidelines and examples for building and managing indexes. +This document provides practical approaches and strategies for indexing +in MongoDB. For the fundamentals of MongoDB indexing, see the +:doc:`/core/indexes` document. For operational guidelines and +procedures, see the :doc:`/administration/indexes` document. -This document provides an overview of approaches to indexing with -MongoDB and a selection of strategies that you can use as you develop -applications with MongoDB. +Indexes allow MongoDB to process and fulfill queries quickly +by creating small and efficient representations of the documents in a +collection. Strategies ---------- +The best indexes for your application are based on a number of important +factors, including the kinds of queries you expect, the ratio of reads +to writes, and the amount of free memory on your system. The best +strategy for designing indexes is always to profile a variety of index +configurations with data sets similar to the ones you'll be running in +production and to see which perform best. There's no substitute for good +empirical analyses. + +.. _indexes-create-to-match-queries: + +Create Indexes to Match Your Queries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you query on a single key, a single-key index will do. For +example, assume you're searching for a blog post's slug: + +.. code-block:: javascript + + db.posts.find({ slug : 'state-of-mongodb-2010' }) + +In this case, a unique index on a single key is best: + +.. code-block:: javascript + + db.posts.ensureIndex({ slug: 1 }, {unique: true}) + +If you query on multiple keys, use a :ref:`compound index +`. With the exception of queries that use the +:operator:`$or` operator, a query cannot use multiple indexes. A query +must use only one index. Therefore, to query on multiple keys, use a +:ref:`compound index `. + +If you query on multiple keys and sort the results, see :ref:`index-sort`. + .. _covered-queries: .. _indexes-covered-queries: @@ -44,7 +76,9 @@ database. To use a covered index you must: Use the :method:`explain() ` to test the query. If MongoDB was able to use a covered index, then the value of the -``indexOnly`` field will be ``true``. +``indexOnly`` field will be ``true``. For more information on +:method:`explain() `, see +:ref:`indexes-measuring-use`. Covered queries are much faster than other queries, for two reasons: indexes are typically stored in RAM *or* located sequentially on @@ -63,29 +97,25 @@ sort operations without the use of an index, these operations are: #. Abort when the sort operation consume 32 megabytes of memory. -For the best result, index the field you want sorted query -results. For example: - -- if you have an ``{ username: 1 }`` index, you can use this index to - return documents sorted by the ``username`` field. +For the best result, index the field you want sorted in your query +results. For example, if you have a ``{ username: 1 }`` index, you can +use this index to return documents sorted by the ``username`` field. - *MongoDB can return sorted results in either ascending or descending - order using an index in ascending or descending order,* because - MongoDB can transverse items in the index in both - directions. For more information about index order see the section - on ":ref:`Ascending and Descending Index Order - `." +*MongoDB can return sorted results in either ascending or descending +order using an index in ascending or descending order,* because MongoDB +can transverse items in the index in both directions. For more +information about index order see :ref:`Ascending and Descending Index +Order `. -- In general, MongoDB can use a compound index to return sorted - results *if*: +In general, MongoDB can use a :ref:`compound index +` to return sorted results *if*: - - the first sorted field is first field in the index. +- The first sorted field is first field in the index. - - the last field in the index before the first sorted field is an - equality match in the query. +- The last field in the index before the first sorted field is an + equality match in the query. - Consider the example presented below for an illustration of this - concept. +Consider the example below for an illustration of this concept. .. example:: @@ -107,7 +137,7 @@ results. For example: db.collection.find( { b:5 } ).sort( { a:1, b:1 } ) db.collection.find( { a:{ $gt:4 } } ).sort( { a:1, b:1 } ) - + db.collection.find( { a:5 } ).sort( { a:1, b:1 } ) db.collection.find( { a:5 } ).sort( { b:1, c:1 } ) @@ -125,13 +155,15 @@ results. For example: db.collection.find( { b:5 } ).sort( { b:1 } ) db.collection.find( { b:{ $gt:5 } } ).sort( { a:1, b:1 } ) -Store Indexes in Memory -~~~~~~~~~~~~~~~~~~~~~~~ +.. _indexes-ensure-indexes-fit-ram: + +Ensure Indexes Fit RAM +~~~~~~~~~~~~~~~~~~~~~~ For best results, always ensure that your indexes fit entirely in RAM, -so the system doesn't need to read the index from disk to -fulfill a query. If your indexes approach or exceed the total size of -available RAM, they may not fit in memory. +so the system doesn't need to read the index from disk to fulfill a +query. If your indexes approach or exceed the total size of available +RAM, they may not fit in memory. You can check the size of your indexes in the :program:`mongo` shell, using the :method:`db.collection.totalIndexSize()` helper. You may also @@ -162,13 +194,13 @@ Considerations Above all, when developing your indexing strategy you should have a deep understanding of: -- the application's queries. +- The application's queries. -- the relative frequency of each query in the application. +- The relative frequency of each query in the application. -- the current indexes created for your collections. +- The current indexes created for your collections. -- which indexes the most common queries use. +- Which indexes the most common queries use. MongoDB can only use *one* index to support any given operation. However, each clause of an :operator:`$or` query can use @@ -237,6 +269,23 @@ with fulfilling the query. There are two aspects of selectivity: ``a`` are evenly distributed *and* the query can selects a specific document using the index. +.. example:: + + Avoid single-key indexes with low selectivity. Suppose you have a + field called ``status`` where the possible values are ``new`` and + ``processed``. If you add an index on ``status`` you've created a + low-selectivity index, meaning that the index will be of little help + in locating records and will be just taking up space. + + A better strategy, depending on your queries, would be to create a + :ref:`compound index ` that includes the + low-selectivity field. For instance, you could have a compound index + on ``status`` and ``created_at.`` + + Another option, again depending on your use case, might be to use + separate collections, one for each status. Experimentation and + benchmarks will help you choose the best approach. + To ensure optimal performance, use indexes that are maximally selective relative to your queries. At the same time queries need to be appropriately selective relative to your indexed data. If overall @@ -245,6 +294,16 @@ to return results, then some queries may perform faster without indexes. See the :ref:`indexes-measuring-use` section for more information on testing information. +Write-heavy Applications +~~~~~~~~~~~~~~~~~~~~~~~~ + +If your application is write-heavy, then be careful when creating new +indexes, since each additional index with impose a small +write-performance penalty. In general, don't be cavalier about adding +indexes. Indexes should be added to complement your queries. Always have +a good reason for adding a new index, and make sure you've benchmarked +alternative strategies. + Insert Throughput ~~~~~~~~~~~~~~~~~ @@ -258,10 +317,10 @@ some amount of overhead to these operations. In almost every case, the performance gains that indexes realize for read operations are worth the insertion penalty; however: -- in some cases, an index to support an infrequent query may incur +- In some cases, an index to support an infrequent query may incur more insert-related costs than saved read-time. -- in some situations, if you have many indexes on a collection with a +- In some situations, if you have many indexes on a collection with a high insert throughput and a number of very similar indexes, you may find better overall results by using a slightly less effective index on some queries if it means consolidating the total number of @@ -274,7 +333,8 @@ the insertion penalty; however: - In some cases a single compound on two or more fields index may support all of the queries that index on a single field index, or a - smaller compound index. In general, MongoDB can use compound index + smaller :ref:`compound index `. In general, + MongoDB can use compound index to support the same queries as any of its prefixes. Consider the following example: diff --git a/source/core/indexes.txt b/source/core/indexes.txt index 54e16fec0fb..bc7ec6d6fef 100644 --- a/source/core/indexes.txt +++ b/source/core/indexes.txt @@ -4,15 +4,21 @@ Indexing Overview .. default-domain:: mongodb +This document provides an overview of indexes in MongoDB, including +index types and creation options. For operational guidelines and +procedures, see the :doc:`/administration/indexes` document. For +strategies and practical approaches, see the +:doc:`/applications/indexes` document. + Synopsis -------- An index is a data structure that allows you to quickly locate documents -based on the values stored in certain specified fields. Fundamentally, indexes -in MongoDB are similar to indexes in other database systems. MongoDB -supports indexes on any field or sub-field contained in documents -within a MongoDB collection. Consider the following core features of -indexes: +based on the values stored in certain specified fields. Fundamentally, +indexes in MongoDB are similar to indexes in other database systems. +MongoDB supports indexes on any field or sub-field contained in +documents within a MongoDB collection. MongoDB indexes have the +following core features: - MongoDB defines indexes on a per-:term:`collection` level. @@ -21,9 +27,9 @@ indexes: operation. - Every query, including update operations, use one and only one - index. The query optimizer selects the index empirically, by - occasionally running alternate query plans, and selecting the plan - with the best response time for each query type. You can override + index. The query optimizer selects the index empirically by + occasionally running alternate query plans and by selecting the plan + with the best response time for each query type. You can override the query optimizer using the :method:`cursor.hint()` method. - You can create indexes on a single field or on multiple fields using @@ -39,13 +45,6 @@ indexes: documents that MongoDB needs to store in memory, thus maximizing database performance and throughput. -Continue reading for a complete overview of indexes in MongoDB, -including the :ref:`types of indexes `, basic -:ref:`operations with indexes `, and other MongoDB -:ref:`features ` implemented using indexes. - -.. TODO links to other documents about indexing - .. index:: index types .. _index-types: diff --git a/source/faq.txt b/source/faq.txt index 1c837a8db08..03d709f5a8c 100644 --- a/source/faq.txt +++ b/source/faq.txt @@ -10,5 +10,4 @@ Frequently Asked Questions /faq/sharding /faq/replica-sets /faq/storage - -.. seealso:: The :wiki:`Indexing FAQ ` wiki page. + /faq/indexes diff --git a/source/faq/indexes.txt b/source/faq/indexes.txt new file mode 100644 index 00000000000..f1391e1452e --- /dev/null +++ b/source/faq/indexes.txt @@ -0,0 +1,51 @@ +============ +FAQ: Indexes +============ + +.. default-domain:: mongodb + +This document addresses common questions regarding MongoDB indexes. + +If you don't find the answer you're looking for, you can: + +- Check the :doc:`/indexes` documentation. + +- Check the :doc:`complete list of FAQs `. + +- Post your question to the `MongoDB User Mailing List + `_. + +.. contents:: Frequently Asked Questions: + :backlinks: none + :local: + +Should you run :dbcommand:`ensureIndex` after every insert? +----------------------------------------------------------- + +No. An index needs to be created only once for a collection. After +initial creation, MongoDB automatically updates the index as data +changes. + +Will building a large index affect database performance? +-------------------------------------------------------- + +Building an index can be an IO-intensive operation, especially if you +have a large collection. This is true on any database system that +supports secondary indexes, including MySQL. If you need to build an +index on a large collection, consider building the index in the +background. See :ref:`index-creation-operations`. + +If you build a large index without the background option, and if doing +so causes the database to stop responding, you have two options: + +- Wait for the index to finish building + +- Kill the current operation (see :method:`db.killOp()`). The partial + index will be deleted. + +Using :operator:`$ne` and :operator:`$nin` in a query is slow. Why? +------------------------------------------------------------------- + +The :operator:`$ne` and :operator:`$nin` operators can match much of an +index. If you need to use these, it is often best to make sure that an +additional, more selective criterion is part of the query. diff --git a/source/reference/method/cursor.explain.txt b/source/reference/method/cursor.explain.txt index 159ce666ea5..f306394c975 100644 --- a/source/reference/method/cursor.explain.txt +++ b/source/reference/method/cursor.explain.txt @@ -15,6 +15,23 @@ cursor.explain() these operations only provide a realistic account of *how* MongoDB would perform the query, and *not* how long the query would take. + The method's output includes these fields: + + - ``cursor``: The value for cursor can be either ``BasicCursor`` or + ``BtreeCursor``. The second of these indicates that the given query + is using an index. + + - ``nscanned``: The number of index entries scanned. + + - ``n``: the number of documents returned by the query. You want the + value of ``n`` to be close to the value of ``nscanned``. You want to + avoid a "collection scan," which is a scan where every document in + the collection is accessed. This is the case when ``nscanned`` is + equal to the number of documents in the collection. + + - ``millis``: the number of milliseconds require to complete the + query. This value is useful for comparing indexing strategies. + .. seealso:: :operator:`$explain` for related functionality and the ":wiki:`Optimization`" wiki page for information regarding optimization strategies. From bd746e1b3bb7d8c18abd4506437156ed140a8b4d Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 12 Sep 2012 17:33:12 -0400 Subject: [PATCH 02/10] DOCS-219 minor edits to index merge --- source/applications/indexes.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index 2e01fd3374b..e1077259467 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -24,7 +24,7 @@ factors, including the kinds of queries you expect, the ratio of reads to writes, and the amount of free memory on your system. The best strategy for designing indexes is always to profile a variety of index configurations with data sets similar to the ones you'll be running in -production and to see which perform best. There's no substitute for good +production and to see which configurations perform best. There's no substitute for good empirical analyses. .. _indexes-create-to-match-queries: @@ -110,7 +110,7 @@ Order `. In general, MongoDB can use a :ref:`compound index ` to return sorted results *if*: -- The first sorted field is first field in the index. +- The first sorted field is the first field in the index. - The last field in the index before the first sorted field is an equality match in the query. @@ -299,7 +299,7 @@ Write-heavy Applications If your application is write-heavy, then be careful when creating new indexes, since each additional index with impose a small -write-performance penalty. In general, don't be cavalier about adding +write-performance penalty. In general, don't be careless about adding indexes. Indexes should be added to complement your queries. Always have a good reason for adding a new index, and make sure you've benchmarked alternative strategies. From 80acbd60d675b442f9af248e686c0826d62607b6 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 18 Sep 2012 17:26:00 -0400 Subject: [PATCH 03/10] DOCS-206 index strategies and faq ongoing edits --- source/applications/indexes.txt | 57 +++++++++++++++++---------------- source/faq/indexes.txt | 36 ++++++++++++--------- source/faq/replica-sets.txt | 2 +- source/faq/sharding.txt | 2 +- 4 files changed, 51 insertions(+), 46 deletions(-) diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index e1077259467..4fc6b32798a 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -8,50 +8,51 @@ Synopsis -------- This document provides practical approaches and strategies for indexing -in MongoDB. For the fundamentals of MongoDB indexing, see the -:doc:`/core/indexes` document. For operational guidelines and -procedures, see the :doc:`/administration/indexes` document. - -Indexes allow MongoDB to process and fulfill queries quickly -by creating small and efficient representations of the documents in a -collection. +in MongoDB. For fundamentals of MongoDB indexing, see +:doc:`/core/indexes`. For operational guidelines and procedures, see +:doc:`/administration/indexes`. Strategies ---------- -The best indexes for your application are based on a number of important -factors, including the kinds of queries you expect, the ratio of reads -to writes, and the amount of free memory on your system. The best -strategy for designing indexes is always to profile a variety of index -configurations with data sets similar to the ones you'll be running in -production and to see which configurations perform best. There's no substitute for good -empirical analyses. +The best indexes for your application are based on a number of factors, +including the kinds of queries you expect, the ratio of reads to writes, +and the amount of free memory on your system. The best overall strategy +for designing indexes is to profile a variety of index configurations +with data sets similar to the ones you'll be running in production and +to see which configurations perform best. .. _indexes-create-to-match-queries: -Create Indexes to Match Your Queries -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Create Indexes to Support Your Queries +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If you query on a single key, a single-key index will do. For -example, assume you're searching for a blog post's slug: +If you only ever query on a single key in a given collection, then you need +create just one, single-key index for that collection. For example, you +might create an index on ``category`` in the``product`` collection: .. code-block:: javascript - db.posts.find({ slug : 'state-of-mongodb-2010' }) + db.products.ensureIndex( { "category": 1 } ) -In this case, a unique index on a single key is best: +However, if you sometimes query on a key alone but and at other times +query on that key combined with a second key, then creating a +ref:`compound index ` is more efficient. MongoDB +will use the compound index for both queries. For example, you might +create an index on both ``category`` and ``item``, allowing you to query +only on ``category`` but also to query on ``category`` combined with +``item``: .. code-block:: javascript - db.posts.ensureIndex({ slug: 1 }, {unique: true}) + db.products.ensureIndex( { "category": 1, "item": 1 } ) -If you query on multiple keys, use a :ref:`compound index -`. With the exception of queries that use the -:operator:`$or` operator, a query cannot use multiple indexes. A query -must use only one index. Therefore, to query on multiple keys, use a -:ref:`compound index `. +.. note:: With the exception of queries that use the :operator:`$or` operator, a + query cannot use multiple indexes. A query must use only one index. + Therefore, to query on multiple keys, use a :ref:`compound index + `. -If you query on multiple keys and sort the results, see :ref:`index-sort`. +To query on multiple keys and sort the results, see :ref:`index-sort`. .. _covered-queries: .. _indexes-covered-queries: @@ -59,7 +60,7 @@ If you query on multiple keys and sort the results, see :ref:`index-sort`. Use Covered Queries ~~~~~~~~~~~~~~~~~~~ -In some cases, MongoDB will be able to fulfill a query using *only* +In some cases, MongoDB can fulfill a query using *only* the index, without needing to scan actual documents from the database. To use a covered index you must: diff --git a/source/faq/indexes.txt b/source/faq/indexes.txt index f1391e1452e..51b763d9de1 100644 --- a/source/faq/indexes.txt +++ b/source/faq/indexes.txt @@ -6,26 +6,29 @@ FAQ: Indexes This document addresses common questions regarding MongoDB indexes. -If you don't find the answer you're looking for, you can: - -- Check the :doc:`/indexes` documentation. - -- Check the :doc:`complete list of FAQs `. - -- Post your question to the `MongoDB User Mailing List - `_. +If you don't find the answer you're looking for, check the +:doc:`complete list of FAQs ` or post your question to the +`MongoDB User Mailing List `_. +See also :doc:`/applications/indexes`. .. contents:: Frequently Asked Questions: :backlinks: none :local: -Should you run :dbcommand:`ensureIndex` after every insert? ------------------------------------------------------------ +Should you run ``ensureIndex()`` after every insert? +---------------------------------------------------- No. An index needs to be created only once for a collection. After initial creation, MongoDB automatically updates the index as data changes. +While running :method:`ensureIndex() ` is +usually ok, if an index doesn't exist because of ongoing administrative +work, a call to :method:`ensureIndex() ` +may disrupt database avalability. Runnning :method:`ensureIndex() ` +can render a replica set inaccessible as the index +creation is happening. See :ref:`index-building-replica-sets`. + Will building a large index affect database performance? -------------------------------------------------------- @@ -40,12 +43,13 @@ so causes the database to stop responding, you have two options: - Wait for the index to finish building -- Kill the current operation (see :method:`db.killOp()`). The partial +- Kill the current operation (see :method:`db.killOP()`). The partial index will be deleted. -Using :operator:`$ne` and :operator:`$nin` in a query is slow. Why? -------------------------------------------------------------------- +Using ``$ne`` and ``$nin`` in a query is slow. Why? +--------------------------------------------------- -The :operator:`$ne` and :operator:`$nin` operators can match much of an -index. If you need to use these, it is often best to make sure that an -additional, more selective criterion is part of the query. +The :operator:`$ne` and :operator:`$nin` operators are not selective. +See :ref:`index-selectivity`. If you need to use these, +it is often best to make sure that an additional, more selective +criterion is part of the query. diff --git a/source/faq/replica-sets.txt b/source/faq/replica-sets.txt index f9c17adf348..424d346264f 100644 --- a/source/faq/replica-sets.txt +++ b/source/faq/replica-sets.txt @@ -8,7 +8,7 @@ This document answers common questions about database replication in MongoDB. If you don't find the answer you're looking for, check -the :doc:`replication index ` or post your question to the +the :doc:`complete list of FAQs ` or post your question to the `MongoDB User Mailing List `_. .. contents:: Frequently Asked Questions: diff --git a/source/faq/sharding.txt b/source/faq/sharding.txt index 495e92662ca..36b6dc1bee9 100644 --- a/source/faq/sharding.txt +++ b/source/faq/sharding.txt @@ -8,7 +8,7 @@ This document answers common questions about horizontal scaling using MongoDB's :term:`sharding`. If you don't find the answer you're looking for, check -the :wiki:`sharding docs ` or post your question to the +the :doc:`complete list of FAQs ` or post your question to the `MongoDB User Mailing List `_. .. contents:: Frequently Asked Questions: From af8e90a0d6feb928cfae50b33367a9fbb458132e Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 19 Sep 2012 12:04:52 -0400 Subject: [PATCH 04/10] minor: changes --- source/applications/indexes.txt | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index 4fc6b32798a..c356560cd93 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -35,7 +35,7 @@ might create an index on ``category`` in the``product`` collection: db.products.ensureIndex( { "category": 1 } ) -However, if you sometimes query on a key alone but and at other times +However, if you sometimes query on only one key but and at other times query on that key combined with a second key, then creating a ref:`compound index ` is more efficient. MongoDB will use the compound index for both queries. For example, you might @@ -60,31 +60,33 @@ To query on multiple keys and sort the results, see :ref:`index-sort`. Use Covered Queries ~~~~~~~~~~~~~~~~~~~ -In some cases, MongoDB can fulfill a query using *only* -the index, without needing to scan actual documents from the -database. To use a covered index you must: +In a covered query, all the search keys are found in a given index. +MongoDB can fulfill the query using *only* the index, without needing to +scan documents from the database. Covered queries are much faster than +other queries, for two reasons: indexes are typically stored in RAM *or* +located sequentially on disk, and indexes are smaller than the documents +they catalog. -- ensure that the index includes all of the fields in the result. +Mongod automatically uses a covered query when it can. But to ensure use +of a covered query you must: + +- Ensure that the index includes all of the fields in the result. This means that the :term:`projection`, must explicitly exclude the ``_id`` field from the result set, unless the index includes ``_id``. -- if any of the indexed fields in any of the documents in the +- If any of the indexed fields in any of the documents in the collection includes an array, then the index becomes a :ref:`multi-key index ` index, and cannot support a covered query. Use the :method:`explain() ` to test the query. If -MongoDB was able to use a covered index, then the value of the +MongoDB was able to use a covered query, then the value of the ``indexOnly`` field will be ``true``. For more information on :method:`explain() `, see :ref:`indexes-measuring-use`. -Covered queries are much faster than other queries, for two reasons: -indexes are typically stored in RAM *or* located sequentially on -disk, and indexes are smaller than the documents they catalog. - .. _index-sort: .. _sorting-with-indexes: From 5b0076c098eff16938b665a3e4f2a70e2bf7d559 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 19 Sep 2012 12:30:01 -0400 Subject: [PATCH 05/10] minor: changes --- source/applications/indexes.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index c356560cd93..750f8840131 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -28,7 +28,7 @@ Create Indexes to Support Your Queries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you only ever query on a single key in a given collection, then you need -create just one, single-key index for that collection. For example, you +create just one single-key index for that collection. For example, you might create an index on ``category`` in the``product`` collection: .. code-block:: javascript From 336cbbff1e799d7a96843d1548d9659cd2f083e8 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 19 Sep 2012 19:13:19 -0400 Subject: [PATCH 06/10] DOCS-206 minor: edits --- source/applications/indexes.txt | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index 750f8840131..2156b021008 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -60,15 +60,17 @@ To query on multiple keys and sort the results, see :ref:`index-sort`. Use Covered Queries ~~~~~~~~~~~~~~~~~~~ -In a covered query, all the search keys are found in a given index. -MongoDB can fulfill the query using *only* the index, without needing to -scan documents from the database. Covered queries are much faster than -other queries, for two reasons: indexes are typically stored in RAM *or* -located sequentially on disk, and indexes are smaller than the documents -they catalog. - -Mongod automatically uses a covered query when it can. But to ensure use -of a covered query you must: +A covered query is a query in which all the search keys are found in a +given index. MongoDB can fulfill the query using *only* the index, +without needing to scan documents from the database. The query is +considered to be "covered" by the index. + +Querying *only* the index is much faster than querying documents. +Indexes are smaller than the documents they catalog. And indexes are +typically stored in RAM or located sequentially on disk. + +Mongod automatically uses a covered query when it can, but to ensure use +of a covered query do the following: - Ensure that the index includes all of the fields in the result. From 7d3801be5cfe1d4e5fb9bd569360b4f04db15e9d Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Thu, 20 Sep 2012 16:19:54 -0400 Subject: [PATCH 07/10] DOCS-206 major revisions to indexing strategies --- source/administration/indexes.txt | 33 +-- source/applications/indexes.txt | 358 ++++++++++++++---------------- 2 files changed, 189 insertions(+), 202 deletions(-) diff --git a/source/administration/indexes.txt b/source/administration/indexes.txt index 01f78c07bfb..1c688c2fcae 100644 --- a/source/administration/indexes.txt +++ b/source/administration/indexes.txt @@ -18,18 +18,12 @@ small and efficient representations of the documents in a collection. Operations ---------- -Creation -~~~~~~~~ +Create an Index +~~~~~~~~~~~~~~~ -Use the :method:`db.collection.ensureIndex()`, or similar :api:`method -for your driver <>`, to create an index. Consider the following -prototype operation: - -.. code-block:: javascript - - db.collection.ensureIndex( { a: 1 } ) - -The following example creates [#ensure]_ an index on the ``phone-number`` field +To create an index, use :method:`db.collection.ensureIndex()` or a similar +:api:`method your driver <>`. For example +the following creates [#ensure]_ an index on the ``phone-number`` field of the ``people`` collection: .. code-block:: javascript @@ -131,8 +125,21 @@ You can also enforce a unique constraint on :ref:`compound indexes These indexes enforce uniqueness for the *combination* of index keys and *not* for either key individually. -Removal -~~~~~~~ +List a Collection's Indexes +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To list a collection's indexes, use the +:method:`db.collection.getIndexes()` method or a similar +:api:`method for your driver <>`. + +For example, to view all indexes on the the ``people`` collection: + +.. code-block:: javascript + + db.people.getIndexes() + +Remove an Index +~~~~~~~~~~~~~~~ To remove an index, use the :method:`db.collection.dropIndex()` method, as in the following example: diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index 2156b021008..7764abcc37a 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -4,12 +4,9 @@ Indexing Strategies .. default-domain:: mongodb -Synopsis --------- - -This document provides practical approaches and strategies for indexing -in MongoDB. For fundamentals of MongoDB indexing, see -:doc:`/core/indexes`. For operational guidelines and procedures, see +This document provides strategies for indexing in MongoDB. For +fundamentals of MongoDB indexing, see :doc:`/core/indexes`. For +operational guidelines and procedures, see :doc:`/administration/indexes`. Strategies @@ -17,19 +14,35 @@ Strategies The best indexes for your application are based on a number of factors, including the kinds of queries you expect, the ratio of reads to writes, -and the amount of free memory on your system. The best overall strategy -for designing indexes is to profile a variety of index configurations -with data sets similar to the ones you'll be running in production and -to see which configurations perform best. +and the amount of free memory on your system. + +When developing your indexing strategy you should have a deep +understanding of: + +- The application's queries. + +- The relative frequency of each query in the application. + +- The current indexes created for your collections. + +- Which indexes the most common queries use. + +The best overall strategy for designing indexes is to profile a variety +of index configurations with data sets similar to the ones you'll be +running in production and to see which configurations perform best. + +MongoDB can only use *one* index to support any given +operation. However, each clause of an :operator:`$or` query can use +its own index. .. _indexes-create-to-match-queries: Create Indexes to Support Your Queries -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-------------------------------------- If you only ever query on a single key in a given collection, then you need create just one single-key index for that collection. For example, you -might create an index on ``category`` in the``product`` collection: +might create an index on ``category`` in the ``product`` collection: .. code-block:: javascript @@ -37,101 +50,80 @@ might create an index on ``category`` in the``product`` collection: However, if you sometimes query on only one key but and at other times query on that key combined with a second key, then creating a -ref:`compound index ` is more efficient. MongoDB +:ref:`compound index ` is more efficient. MongoDB will use the compound index for both queries. For example, you might -create an index on both ``category`` and ``item``, allowing you to query -only on ``category`` but also to query on ``category`` combined with -``item``: +create an index on both ``category`` and ``item``, allowing you both +options: to query only on ``category`` and also to query on ``category`` +combined with ``item``: .. code-block:: javascript db.products.ensureIndex( { "category": 1, "item": 1 } ) -.. note:: With the exception of queries that use the :operator:`$or` operator, a - query cannot use multiple indexes. A query must use only one index. - Therefore, to query on multiple keys, use a :ref:`compound index - `. - To query on multiple keys and sort the results, see :ref:`index-sort`. +With the exception of queries that use the :operator:`$or` operator, a +query cannot use multiple indexes. A query must use only one index. + .. _covered-queries: .. _indexes-covered-queries: -Use Covered Queries -~~~~~~~~~~~~~~~~~~~ +Create Indexes that Support Covered Queries +------------------------------------------- A covered query is a query in which all the search keys are found in a -given index. MongoDB can fulfill the query using *only* the index, -without needing to scan documents from the database. The query is -considered to be "covered" by the index. +given index. A covered query is considered to be "covered" by the index. +MongoDB can fulfill the query by using *only* the index. MongoDB need +not scan documents from the database. Querying *only* the index is much faster than querying documents. -Indexes are smaller than the documents they catalog. And indexes are +Indexes are smaller than the documents they catalog, and indexes are typically stored in RAM or located sequentially on disk. -Mongod automatically uses a covered query when it can, but to ensure use -of a covered query do the following: - -- Ensure that the index includes all of the fields in the result. +Mongod automatically uses a covered query when possible. To ensure use +of a covered query, create an index that includes all the fields listed +in the query result. This means that the :term:`projection` must +explicitly exclude the ``_id`` field from the result set, unless the +index includes ``_id``. - This means that the :term:`projection`, must explicitly exclude the - ``_id`` field from the result set, unless the index includes - ``_id``. +MongoDB cannot use a covered query if any of the indexed fields in any +of the documents in the collection include an array. If an indexed field +is an array, the index becomes a :ref:`multi-key index +` index and cannot support a covered query. -- If any of the indexed fields in any of the documents in the - collection includes an array, then the index becomes a - :ref:`multi-key index ` index, and cannot - support a covered query. - -Use the :method:`explain() ` to test the query. If -MongoDB was able to use a covered query, then the value of the -``indexOnly`` field will be ``true``. For more information on -:method:`explain() `, see -:ref:`indexes-measuring-use`. +To test whether MongoDB used a covered query, use +:method:`explain() `. If the output displays ``true`` +for the ``indexOnly`` field, MongoDB used a covered query. For +more information see :ref:`indexes-measuring-use`. .. _index-sort: .. _sorting-with-indexes: -Sort Using Indexes -~~~~~~~~~~~~~~~~~~ - -While the :method:`sort() ` method supports in-memory -sort operations without the use of an index, these operations are: - -#. Significantly slower than sort operations that use an index. - -#. Abort when the sort operation consume 32 megabytes of memory. +Use Indexes to Sort Query Results +--------------------------------- -For the best result, index the field you want sorted in your query -results. For example, if you have a ``{ username: 1 }`` index, you can -use this index to return documents sorted by the ``username`` field. +For the fastest performance when sorting query results by a given field, +create a sorted index on that field. To sort query results on multiple +fields, create a :ref:`compound index `. For +details on creating compound indexes with sort in mind, see +:ref:`index-ascending-and-descending`. -*MongoDB can return sorted results in either ascending or descending -order using an index in ascending or descending order,* because MongoDB -can transverse items in the index in both directions. For more -information about index order see :ref:`Ascending and Descending Index -Order `. +MongoDB uses a compound index to return sorted results *if*: -In general, MongoDB can use a :ref:`compound index -` to return sorted results *if*: +- The first field in the index is the first sorted field. -- The first sorted field is the first field in the index. - -- The last field in the index before the first sorted field is an +- The last field in the index *before the first sorted field* is an equality match in the query. -Consider the example below for an illustration of this concept. - .. example:: - Given the following index: + If you create the following index: .. code-block:: javascript { a: 1, b: 1, c: 1, d: 1 } - The following query and sort operations will be able to use the - index: + The following query and sort operations can use the index: .. code-block:: javascript @@ -141,18 +133,17 @@ Consider the example below for an illustration of this concept. db.collection.find( { a:4 } ).sort( { a:1, b:1 } ) db.collection.find( { b:5 } ).sort( { a:1, b:1 } ) - db.collection.find( { a:{ $gt:4 } } ).sort( { a:1, b:1 } ) - - db.collection.find( { a:5 } ).sort( { a:1, b:1 } ) db.collection.find( { a:5 } ).sort( { b:1, c:1 } ) db.collection.find( { a:5, c:4, b:3 } ).sort( { d:1 } ) + db.collection.find( { a:{ $gt:4 } } ).sort( { a:1, b:1 } ) + db.collection.find( { a:5, b:3, d:{ $gt:4 } } ).sort( { c:1 } ) db.collection.find( { a:5, b:3, c:{ $lt:2 }, d:{ $gt:4 } } ).sort( { c:1 } ) - However, the following query operations would not be able to sort - the results using the index: + However, the following queries cannot sort the results using the + index: .. code-block:: javascript @@ -160,77 +151,114 @@ Consider the example below for an illustration of this concept. db.collection.find( { b:5 } ).sort( { b:1 } ) db.collection.find( { b:{ $gt:5 } } ).sort( { a:1, b:1 } ) +.. note:: + + Sorting query results by an index is faster than sorting results by + the :method:`sort() ` method. The method supports + in-memory sort operations without the use of an index, but these + operations are significantly slower than sort operations that use an + index, and they abort when the sort operation consume 32 megabytes of + memory. + .. _indexes-ensure-indexes-fit-ram: Ensure Indexes Fit RAM -~~~~~~~~~~~~~~~~~~~~~~ - -For best results, always ensure that your indexes fit entirely in RAM, -so the system doesn't need to read the index from disk to fulfill a -query. If your indexes approach or exceed the total size of available -RAM, they may not fit in memory. +---------------------- -You can check the size of your indexes in the :program:`mongo` shell, -using the :method:`db.collection.totalIndexSize()` helper. You may also -use :dbcommand:`collStats` or :method:`db.collection.stats()` to return -this and :doc:`related information `. +For fastest processing, ensure that your indexes fit entirely in RAM so +that the system can avoid reading the index from disk. -:method:`db.collection.totalIndexSize()` returns data in bytes. Consider -the following invocation: +To check the size of your indexes, use the +:method:`db.collection.totalIndexSize()` helper, which returns data in +bytes: .. code-block:: javascript > db.collection.totalIndexSize() 4294976499 -This reports a total index size of roughly 4 gigabytes. Consider this -value in contrast to the total amount of available system RAM and the -rest of the :term:`working set`. Also remember: +The above example shows an index size of almost 4.3 gigabytes. To ensure +this index fits in RAM, you must not only have more than that much RAM +available but also must have RAM available for the rest of the +:term:`working set`. Also remember: -- if you have and use multiple collections to consider the size of - all indexes on all collections. +- If you have and use multiple collections, you must consider the size + of all indexes on all collections. -- there are some :ref:`limited cases where indexes do not need to fit - in RAM `. +- There are some limited cases where indexes do not need to fit in RAM. + See :ref:`indexing-right-handed`. -Considerations --------------- +.. seealso:: For additional :doc:`collection statistics + `, use :dbcommand:`collStats` or + :method:`db.collection.stats()`. -Above all, when developing your indexing strategy you should have a -deep understanding of: +Indexes require space, both on disk and in RAM. Indexes require less +space in RAM than the full documents in the collection. In theory, if +your queries only match a subset of the documents and can use the index +to locate those documents, MongoDB can maintain a much smaller +:term:`working set`. Ensure that: -- The application's queries. +- The indexes and the working set can fit RAM at the same time. -- The relative frequency of each query in the application. +- All of your indexes use less space than all of the documents in the + collection. This may not be an issue all of your queries use + :ref:`covered queries ` or indexes do not need to fit + into ram, as in the following situation: -- The current indexes created for your collections. +.. _indexing-right-handed: -- Which indexes the most common queries use. +Indexes that Hold Only Recent Values in RAM +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -MongoDB can only use *one* index to support any given -operation. However, each clause of an :operator:`$or` query can use -its own index. +Indexes do not have to fit *entirely* into RAM in all cases. If the +value of the indexed field grows with every insert, and most queries +select recently added documents; then MongoDB only needs to keep the +parts of the index that hold the most recent or "right-most" values in +RAM. This allows for efficient index use for read and write +operations and minimize the amount of RAM required to support the +index. + +.. To determine the size of the index, see DOCS-224 .. _index-selectivity: -Selectivity -~~~~~~~~~~~ +Create Queries that Ensure Selectivity +-------------------------------------- -Selectivity describes the ability of a query to narrow the result set +Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated -with fulfilling the query. There are two aspects of selectivity: +with fulfilling the query. -#. Data need to have a high distribution of the values for the indexed - key. +To ensure selectivity: -#. Queries need to limit the number of possible documents using the - indexed field. +- Only index keys that have a high high distribution of the values + within the collection. + +- Write queries that limit the number of possible documents with the + indexed field. Write queries that are appropriately selective relative + to your indexed data. .. example:: - First, consider an index, ``{ a : 1 }``, on a collection where - ``a`` has three values evenly distributed across the collection: + Suppose you have a field called ``status`` where the possible values + are ``new`` and ``processed``. If you add an index on ``status`` + you've created a low-selectivity index. The index will + be of little help in locating records. + + A better strategy, depending on your queries, would be to create a + :ref:`compound index ` that includes the + low-selectivity field and another field. For example, you could + create a compound index on ``status`` and ``created_at.`` + + Another option, again depending on your use case, might be to use + separate collections, one for each status. + +.. example:: + + Consider an index ``{ a : 1 }`` (i.e. an index on the key ``a`` + sorted in ascending order) on a collection where ``a`` has three + values evenly distributed across the collection: .. code-block:: javascript @@ -244,14 +272,13 @@ with fulfilling the query. There are two aspects of selectivity: { _id: ObjectId(), a: 3, b: "rs" } { _id: ObjectId(), a: 3, b: "tv" } - If you do a query for ``{ a: 2, b: "no" }`` MongoDB will still need - to scan 3 documents of the :term:`documents ` in the - collection to fulfill the query. Similarly, a query for ``{ a: { - $gt: 1}, b: "tv" }``, would need to scan through 6 documents, - although both queries would return the same result. + If you query for ``{ a: 2, b: "no" }`` MongoDB must scan 3 + :term:`documents ` in the collection to return the one + matching result. Similarly, a query for ``{ a: { $gt: 1}, b: "tv" }`` + must scan 6 documents, also to return one result. - Then, consider an index on a field that has many values evenly - distributed across the collection: + Consider the same index on a collection where ``a`` has *nine* values + evenly distributed across the collection: .. code-block:: javascript @@ -265,42 +292,24 @@ with fulfilling the query. There are two aspects of selectivity: { _id: ObjectId(), a: 8, b: "rs" } { _id: ObjectId(), a: 9, b: "tv" } - Although the index on ``a`` is more selective, in the sense that - queries can use the index more effectively, a query such as ``{ a: - { $gt: 5 }, b: "tv" }`` would still need to scan 4 documents. By - contrast, given a query like ``{ a: 2, b: "cd" }``, MongoDB would - only need to scan one document to fulfill the rest of the - query. The index and query are more selective because the values of - ``a`` are evenly distributed *and* the query can selects a specific - document using the index. + If you query for ``{ a: 2, b: "cd" }``, MongoDB must scan only one + document to fulfill the query. The index and query are more selective + because the values of ``a`` are evenly distributed *and* the query + can select a specific document using the index. -.. example:: + However, although the index on ``a`` is more selective, a query such + as ``{ a: { $gt: 5 }, b: "tv" }`` would still need to scan 4 + documents. - Avoid single-key indexes with low selectivity. Suppose you have a - field called ``status`` where the possible values are ``new`` and - ``processed``. If you add an index on ``status`` you've created a - low-selectivity index, meaning that the index will be of little help - in locating records and will be just taking up space. + .. TODO is there an answer to that last "However" paragraph? - A better strategy, depending on your queries, would be to create a - :ref:`compound index ` that includes the - low-selectivity field. For instance, you could have a compound index - on ``status`` and ``created_at.`` - - Another option, again depending on your use case, might be to use - separate collections, one for each status. Experimentation and - benchmarks will help you choose the best approach. - -To ensure optimal performance, use indexes that are maximally -selective relative to your queries. At the same time queries need to -be appropriately selective relative to your indexed data. If overall -selectivity is low enough, and MongoDB must read a number of documents -to return results, then some queries may perform faster without -indexes. See the :ref:`indexes-measuring-use` section for more -information on testing information. +If overall selectivity is low, and if MongoDB must read a number of +documents to return results, then some queries may perform faster +without indexes. To determine performance, see +:ref:`indexes-measuring-use`. -Write-heavy Applications -~~~~~~~~~~~~~~~~~~~~~~~~ +Consider Performance when Creating Indexes for Write-heavy Applications +----------------------------------------------------------------------- If your application is write-heavy, then be careful when creating new indexes, since each additional index with impose a small @@ -309,18 +318,17 @@ indexes. Indexes should be added to complement your queries. Always have a good reason for adding a new index, and make sure you've benchmarked alternative strategies. -Insert Throughput -~~~~~~~~~~~~~~~~~ +Consider Insert Throughput +~~~~~~~~~~~~~~~~~~~~~~~~~~ .. TODO insert link to /source/core/write-operations when that page is complete. MongoDB must update all indexes associated with a collection after every insert, update, or delete operation. -Every index on a collection adds -some amount of overhead to these operations. In almost every case, the -performance gains that indexes realize for read operations are worth -the insertion penalty; however: +Every index on a collection adds some amount of overhead to these +operations. In almost every case, the performance gains that indexes +realize for read operations are worth the insertion penalty; however: - In some cases, an index to support an infrequent query may incur more insert-related costs than saved read-time. @@ -381,31 +389,3 @@ the insertion penalty; however: See the :ref:`sorting-with-indexes` section for more information. - -Index Size -~~~~~~~~~~ - -Indexes require space, both on disk and in RAM. Indexes require less -space in RAM than the full documents in the collection. In theory, if -your queries only match a subset of the documents and can use the -index to locate those documents, MongoDB can maintain a much smaller -:term:`working set`. Ensure that: - -- the indexes and the working set can fit RAM at the same time. - -- all of your indexes use less space than all of the documents in the - collection. This may not be an issue all of your queries use - :ref:`covered queries ` or indexes do not need to - fit into ram, as in the following situation: - -.. _indexing-right-handed: - -Indexes do not have to fit *entirely* into RAM in all cases. If the -value of the indexed field grows with every insert, and most queries -select recently added documents; then MongoDB only needs to keep the -parts of the index that hold the most recent or "right-most" values in -RAM. This allows for efficient index use for read and write -operations and minimize the amount of RAM required to support the -index. - -.. To determine the size of the index, see DOCS-224 From e31ea63c11ed662391e89ca867b6958c659afcb1 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Thu, 20 Sep 2012 17:52:03 -0400 Subject: [PATCH 08/10] DOCS-206 new index faq page --- source/applications/indexes.txt | 162 +++++++++++++++----------------- source/faq/indexes.txt | 65 ++++++++++++- 2 files changed, 138 insertions(+), 89 deletions(-) diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index 7764abcc37a..48af22f8f8d 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -52,14 +52,15 @@ However, if you sometimes query on only one key but and at other times query on that key combined with a second key, then creating a :ref:`compound index ` is more efficient. MongoDB will use the compound index for both queries. For example, you might -create an index on both ``category`` and ``item``, allowing you both -options: to query only on ``category`` and also to query on ``category`` -combined with ``item``: +create an index on both ``category`` and ``item``. .. code-block:: javascript db.products.ensureIndex( { "category": 1, "item": 1 } ) +This allows you both options. You can query on just ``category``, and +you also can query on ``category`` combined with ``item``. + To query on multiple keys and sort the results, see :ref:`index-sort`. With the exception of queries that use the :operator:`$or` operator, a @@ -68,6 +69,50 @@ query cannot use multiple indexes. A query must use only one index. .. _covered-queries: .. _indexes-covered-queries: + +Use Compound Indexes to Support Several Different Queries +--------------------------------------------------------- + +A single :ref:`compound index ` on multiple fields +can support all the queries that search a "prefix" subset of those fields. + +.. example:: + + The following index on a collection: + + .. code-block:: javascript + + { x: 1, y: 1, z: 1 } + + Can support queries that the following indexes support: + + .. code-block:: javascript + + { x: 1 } + { x: 1, y: 1 } + + There are some situations where the prefix indexes may offer better + query performance: for example if ``z`` is a large array. + + The ``{ x: 1, y: 1, z: 1 }`` index can also support many of the same + queries as the following index: + + .. code-block:: javascript + + { x: 1, z: 1 } + + Also, ``{ x: 1, z: 1 }`` has an additional use. Given the following + query: + + .. code-block:: javascript + + db.collection.find( { x: 5 } ).sort( { z: 1} ) + + The ``{ x: 1, z: 1 }`` index supports both the query and the sort + operation, while the ``{ x: 1, y: 1, z: 1 }`` index only supports + the query. For more information on sorting, see + :ref:`sorting-with-indexes`. + Create Indexes that Support Covered Queries ------------------------------------------- @@ -183,28 +228,19 @@ available but also must have RAM available for the rest of the :term:`working set`. Also remember: - If you have and use multiple collections, you must consider the size - of all indexes on all collections. + of all indexes on all collections. The indexes and the working set must be able to + fit RAM at the same time. -- There are some limited cases where indexes do not need to fit in RAM. - See :ref:`indexing-right-handed`. +- All of your indexes use less space than all of the documents in the + collection. This may not be an issue if all your queries use + :ref:`covered queries ` or if indexes do not need to + fit into RAM. There are some limited cases where indexes do not need + to fit in RAM. See :ref:`indexing-right-handed`. .. seealso:: For additional :doc:`collection statistics `, use :dbcommand:`collStats` or :method:`db.collection.stats()`. -Indexes require space, both on disk and in RAM. Indexes require less -space in RAM than the full documents in the collection. In theory, if -your queries only match a subset of the documents and can use the index -to locate those documents, MongoDB can maintain a much smaller -:term:`working set`. Ensure that: - -- The indexes and the working set can fit RAM at the same time. - -- All of your indexes use less space than all of the documents in the - collection. This may not be an issue all of your queries use - :ref:`covered queries ` or indexes do not need to fit - into ram, as in the following situation: - .. _indexing-right-handed: Indexes that Hold Only Recent Values in RAM @@ -218,17 +254,14 @@ RAM. This allows for efficient index use for read and write operations and minimize the amount of RAM required to support the index. -.. To determine the size of the index, see DOCS-224 - .. _index-selectivity: Create Queries that Ensure Selectivity -------------------------------------- -Selectivity is the ability of a query to narrow results -using the index. Effective indexes are more selective and allow -MongoDB to use the index for a larger portion of the work associated -with fulfilling the query. +Selectivity is the ability of a query to narrow results using the index. +Effective indexes are more selective and allow MongoDB to use the index +for a larger portion of the work associated with fulfilling the query. To ensure selectivity: @@ -322,70 +355,27 @@ Consider Insert Throughput ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. TODO insert link to /source/core/write-operations when that page is complete. + Do we want to link to write concern? -bg -MongoDB must update all indexes associated with a collection after -every insert, update, or delete operation. - -Every index on a collection adds some amount of overhead to these -operations. In almost every case, the performance gains that indexes -realize for read operations are worth the insertion penalty; however: - -- In some cases, an index to support an infrequent query may incur - more insert-related costs than saved read-time. - -- In some situations, if you have many indexes on a collection with a - high insert throughput and a number of very similar indexes, you may - find better overall results by using a slightly less effective index - on some queries if it means consolidating the total number of - indexes. - -- If your indexes and queries are not very selective, the speed - improvements for query operations may not offset the costs of - maintaining an index. See the section on :ref:`index selectivity - ` for more information. - -- In some cases a single compound on two or more fields index may - support all of the queries that index on a single field index, or a - smaller :ref:`compound index `. In general, - MongoDB can use compound index - to support the same queries as any of its prefixes. Consider the - following example: - - .. example:: - - Given the following index on a collection: - - .. code-block:: javascript - - { x: 1, y: 1, z: 1 } - - Can support a number of queries as well as most of the queries - that the following indexes support: - - .. code-block:: javascript - - { x: 1 } - { x: 1, y: 1 } - - There are some situations where the prefix indexes may offer - better query performance as is the case if ``z`` is a large - array. Also, consider the following index on the same collection: - - .. code-block:: javascript - - { x: 1, z: 1 } +MongoDB must update all indexes associated with a collection after every +insert, update, or delete operation. Therefore, every index on a +collection adds some amount of overhead to these operations. In almost +every case, the performance gains that indexes realize for read +operations are worth the insertion penalty. However, in some cases: - The ``{ x: 1, y: 1, z: 1 }`` index can support many of the same - queries as the above index; however, ``{ x: 1, z: 1 }`` has - additional use: Given the following query: +- An index to support an infrequent query might incur more + insert-related costs than saved read-time. - .. code-block:: javascript + .. TODO How do you determine if the above is the case? - db.collection.find( { x: 5 } ).sort( { z: 1} ) +- If you have many indexes on a collection with a high insert throughput + and a number of very similar indexes, you may find better overall + results by using a slightly less effective index on some queries if it + means consolidating the total number of indexes. - The ``{ x: 1, z: 1 }`` will support both the query and the sort - operation, while the ``{ x: 1, y: 1, z: 1 }`` index can only - support the query. + .. TODO The above is unclear. -bg - See the :ref:`sorting-with-indexes` section for more - information. +- If your indexes and queries are not very :ref:`selective + `, the speed improvements for query operations + might not offset the costs of maintaining an index. For more + information see :ref:`index-selectivity`. diff --git a/source/faq/indexes.txt b/source/faq/indexes.txt index 51b763d9de1..bcce09d8cb5 100644 --- a/source/faq/indexes.txt +++ b/source/faq/indexes.txt @@ -25,9 +25,68 @@ changes. While running :method:`ensureIndex() ` is usually ok, if an index doesn't exist because of ongoing administrative work, a call to :method:`ensureIndex() ` -may disrupt database avalability. Runnning :method:`ensureIndex() ` -can render a replica set inaccessible as the index -creation is happening. See :ref:`index-building-replica-sets`. +may disrupt database avalability. Runnning :method:`ensureIndex() +` can render a replica set inaccessible as +the index creation is happening. See :ref:`index-building-replica-sets`. + +How do you know what indexes exist in a collection? +--------------------------------------------------- + +To list a collection's indexes, use the +:method:`db.collection.getIndexes()` method or a similar +:api:`method for your driver <>`. + +How do you determine the size of an index? +------------------------------------------ + +To check index size, use :method:`db.collection.totalIndexSize()`. + +.. TODO FAQ How do I determine if an index fits into RAM? + +What happens if an index does not fit into RAM? +----------------------------------------------- + +When an index is too large to fit into RAM, MongoDB must read the index +from disk, which is a much slower operation than reading from RAM. Keep +in mind an index fits into RAM when your server has RAM available for +the index combined with the rest of the :term:`working set`. + +In certain cases, an index does not need to fit *entirely* into RAM. For +details, see :ref:`indexing-right-handed`. + +.. TODO FAQ How does MongoDB determine what index to use? + +How do you know what index a query used? +---------------------------------------- + +To determine how a query is processed, use the :method:`explain() +` method. + +How do you determine what fields to index? +------------------------------------------ + +A number of factors determine what fields to index, including +:ref:`selectivity `, fitting indexes into RAM, +reusing indexes in multiple queries when possible, and creating indexes +that can support all the fields in a given query. For detailed +documentation on choosing which fields to index, see +:doc:`/applications/indexes`. + +.. TODO FAQ How do I guarantee a query uses an index? + MongoDB's query optimizer always looks for the most advantageous + index to use. You cannot guarantee use of a particular index, but you + can write indexes with your queries in mind. For detailed + documentation on creating optimal indexes, see + :doc:`/applications/indexes`. + +How do write operations affect indexes? +--------------------------------------- + +While a read operation does not affect an index, every write operation +does involve a write to the index. If your application is write-heavy, +creating too many indexes might affect performance. + +.. TODO More is needed on that last FAQ. Will building a large index affect database performance? -------------------------------------------------------- From 38bae5a7b623a6e6830e0f5898977f30cd55de02 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 25 Sep 2012 11:11:00 -0400 Subject: [PATCH 09/10] DOCS-206 updated faq page per review edits --- source/faq/indexes.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/source/faq/indexes.txt b/source/faq/indexes.txt index bcce09d8cb5..cc0ebce2064 100644 --- a/source/faq/indexes.txt +++ b/source/faq/indexes.txt @@ -98,12 +98,12 @@ index on a large collection, consider building the index in the background. See :ref:`index-creation-operations`. If you build a large index without the background option, and if doing -so causes the database to stop responding, you have two options: +so causes the database to stop responding, +wait for the index to finish building. -- Wait for the index to finish building - -- Kill the current operation (see :method:`db.killOP()`). The partial - index will be deleted. +.. FUTURE When SERVER-3067 is fixed, this option also will be available: + Kill the current operation (see :method:`db.killOP()`). The partial + index will be deleted. Using ``$ne`` and ``$nin`` in a query is slow. Why? --------------------------------------------------- From 7be9c4d5b8ae89b8feb635ccdae5a49653cffb57 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 25 Sep 2012 16:08:42 -0400 Subject: [PATCH 10/10] DOCS-206 updated indexing strategies per review edits --- source/applications/indexes.txt | 71 ++++++++++++++------------------- 1 file changed, 29 insertions(+), 42 deletions(-) diff --git a/source/applications/indexes.txt b/source/applications/indexes.txt index 48af22f8f8d..4d82137c765 100644 --- a/source/applications/indexes.txt +++ b/source/applications/indexes.txt @@ -60,8 +60,7 @@ create an index on both ``category`` and ``item``. This allows you both options. You can query on just ``category``, and you also can query on ``category`` combined with ``item``. - -To query on multiple keys and sort the results, see :ref:`index-sort`. +(To query on multiple keys and sort the results, see :ref:`index-sort`.) With the exception of queries that use the :operator:`$or` operator, a query cannot use multiple indexes. A query must use only one index. @@ -69,7 +68,6 @@ query cannot use multiple indexes. A query must use only one index. .. _covered-queries: .. _indexes-covered-queries: - Use Compound Indexes to Support Several Different Queries --------------------------------------------------------- @@ -116,18 +114,20 @@ can support all the queries that search a "prefix" subset of those fields. Create Indexes that Support Covered Queries ------------------------------------------- -A covered query is a query in which all the search keys are found in a +A covered index query is a query in which all the search keys are found in a given index. A covered query is considered to be "covered" by the index. MongoDB can fulfill the query by using *only* the index. MongoDB need not scan documents from the database. Querying *only* the index is much faster than querying documents. -Indexes are smaller than the documents they catalog, and indexes are +Indexes keys are typically smaller than the documents they catalog, and indexes are typically stored in RAM or located sequentially on disk. Mongod automatically uses a covered query when possible. To ensure use of a covered query, create an index that includes all the fields listed -in the query result. This means that the :term:`projection` must +in the query result. This means that the :term:`projection` document +given to a query (to specify which fields MongoDB returns from +the result set) must explicitly exclude the ``_id`` field from the result set, unless the index includes ``_id``. @@ -148,17 +148,12 @@ Use Indexes to Sort Query Results --------------------------------- For the fastest performance when sorting query results by a given field, -create a sorted index on that field. To sort query results on multiple -fields, create a :ref:`compound index `. For -details on creating compound indexes with sort in mind, see -:ref:`index-ascending-and-descending`. - -MongoDB uses a compound index to return sorted results *if*: +create a sorted index on that field. -- The first field in the index is the first sorted field. - -- The last field in the index *before the first sorted field* is an - equality match in the query. +To sort query results on multiple fields, create a :ref:`compound index +`. MongoDB sorts results based on the field order +in the index. Ensure that any index constraints on index fields that are +before the first sort field are equalities. .. example:: @@ -194,16 +189,13 @@ MongoDB uses a compound index to return sorted results *if*: db.collection.find().sort( { b:1 } ) db.collection.find( { b:5 } ).sort( { b:1 } ) - db.collection.find( { b:{ $gt:5 } } ).sort( { a:1, b:1 } ) .. note:: - Sorting query results by an index is faster than sorting results by - the :method:`sort() ` method. The method supports - in-memory sort operations without the use of an index, but these - operations are significantly slower than sort operations that use an - index, and they abort when the sort operation consume 32 megabytes of - memory. + When the :method:`sort() ` method performs an + in-memory sort operation without the use of an index, the operation + is significantly slower than for operations that use an index, and + the operation aborts when it consumes 32 megabytes of memory. .. _indexes-ensure-indexes-fit-ram: @@ -227,15 +219,12 @@ this index fits in RAM, you must not only have more than that much RAM available but also must have RAM available for the rest of the :term:`working set`. Also remember: -- If you have and use multiple collections, you must consider the size - of all indexes on all collections. The indexes and the working set must be able to - fit RAM at the same time. +If you have and use multiple collections, you must consider the size +of all indexes on all collections. The indexes and the working set must be able to +fit RAM at the same time. -- All of your indexes use less space than all of the documents in the - collection. This may not be an issue if all your queries use - :ref:`covered queries ` or if indexes do not need to - fit into RAM. There are some limited cases where indexes do not need - to fit in RAM. See :ref:`indexing-right-handed`. +There are some limited cases where indexes do not need +to fit in RAM. See :ref:`indexing-right-handed`. .. seealso:: For additional :doc:`collection statistics `, use :dbcommand:`collStats` or @@ -263,14 +252,10 @@ Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query. -To ensure selectivity: - -- Only index keys that have a high high distribution of the values - within the collection. - -- Write queries that limit the number of possible documents with the - indexed field. Write queries that are appropriately selective relative - to your indexed data. +To ensure selectivity, +write queries that limit the number of possible documents with the +indexed field. Write queries that are appropriately selective relative +to your indexed data. .. example:: @@ -345,7 +330,7 @@ Consider Performance when Creating Indexes for Write-heavy Applications ----------------------------------------------------------------------- If your application is write-heavy, then be careful when creating new -indexes, since each additional index with impose a small +indexes, since each additional index with impose a write-performance penalty. In general, don't be careless about adding indexes. Indexes should be added to complement your queries. Always have a good reason for adding a new index, and make sure you've benchmarked @@ -357,8 +342,10 @@ Consider Insert Throughput .. TODO insert link to /source/core/write-operations when that page is complete. Do we want to link to write concern? -bg -MongoDB must update all indexes associated with a collection after every -insert, update, or delete operation. Therefore, every index on a +MongoDB must update *all* indexes associated with a collection after every +insert, update, or delete operation. (With updates, if the updated document +does not move to a new location, then only the modified, indexed fields +are updated in the index.) Therefore, every index on a collection adds some amount of overhead to these operations. In almost every case, the performance gains that indexes realize for read operations are worth the insertion penalty. However, in some cases: