diff --git a/source/administration/indexes.txt b/source/administration/indexes.txt index f46a9718aef..2228f097721 100644 --- a/source/administration/indexes.txt +++ b/source/administration/indexes.txt @@ -132,6 +132,30 @@ without the ``twitter_name`` field. index. See the :ref:`sparse index ` section for more information. + +.. index:: index; hashed +.. _index-hashed-index: + +Hashed Indexes +~~~~~~~~~~~~~~ + +.. versionadded:: 2.4 + +To create a :ref:`index-type-hashed`, specify `"hashed"` as +the value of the index key: + +.. example:: + + .. code-block:: javascript + + db.collection.ensureIndex({a:"hashed"}) + +A hashed index can be created on any single field. +The hashing function collapses compound documents together +and hashes the resulting data. + +A hashed index cannot be combined with other index specifications. + .. index:: index; unique .. _index-unique-index: diff --git a/source/administration/tag-aware-sharding.txt b/source/administration/tag-aware-sharding.txt index 1fce6d7dd14..80f4328d15f 100644 --- a/source/administration/tag-aware-sharding.txt +++ b/source/administration/tag-aware-sharding.txt @@ -27,6 +27,8 @@ sharding in MongoDB deployments. Shard key range tags are entirely distinct from :ref:`replica set member tags `. +Tag-aware sharding cannot be used with :term:`hashed shard keys `. + Behavior and Operations ----------------------- diff --git a/source/core/indexes.txt b/source/core/indexes.txt index a4b03b7583c..7ab31d07f36 100644 --- a/source/core/indexes.txt +++ b/source/core/indexes.txt @@ -37,9 +37,9 @@ MongoDB indexes have the following core features: requirements as you create indexes in your MongoDB environment. - All MongoDB indexes use a B-tree data structure. MongoDB can use - these representation of the data to optimize query responses. + this representation of the data to optimize query responses. -- Every query, including update operations, use one and only one +- Every query, including update operations, uses one and only one index. The :ref:`query optimizer ` selects the index empirically by occasionally running alternate query plans and by selecting the plan @@ -82,7 +82,7 @@ Index Types This section enumerates the types of indexes available in MongoDB. For all collections, MongoDB creates the default :ref:`_id index `. You can create additional indexes with the -:method:`ensureIndex() ` method on any +:method:`~db.collection.ensureIndex()` method on any single field or :ref:`sequence of fields ` within any document or :ref:`sub-document `. MongoDB also supports indexes of arrays, called :ref:`multi-key indexes @@ -274,6 +274,13 @@ index, however, would not support queries that select the following: - only the ``location`` and ``stock`` fields - only the ``item`` and ``stock`` fields + +.. note:: + + :ref:`Hashed indexes ` are incompatible with compound indexes. You will receive + an error if you attempt to create a compound index including a hashed + index. + When creating an index, the number associated with a key specifies the direction of the index. The options are ``1`` (ascending) and ``-1`` (descending). Direction doesn't matter for single key indexes or for @@ -360,6 +367,14 @@ value in the array separately, in a "multikey index." Queries could use the multikey index to return queries for any of the above values. +.. note:: + + MongoDB computes values for a hashed index on the entire content of a field, + including fields that hold arrays or sub-documents. + + For fields that hold arrays and sub-documents, you cannot use the index to + support any query that introspects the value of an array or sub-document. + You can use multikey indexes to index fields within objects embedded in arrays, as in the following example: @@ -509,6 +524,42 @@ By default, ``sparse`` is ``false`` on MongoDB indexes. have the indexed field *are* indexed in a sparse index, even if that field stores a null value in some documents. +.. index:: index; hashed +.. _index-type-hashed: + +Hashed Index +~~~~~~~~~~~~ + +.. versionadded:: 2.4 + +Hashed indexes contain entries consisting of a hash of the indexed field. +Hashed indexes cannot be compound indexes. +Hashed indexes can be created on only one field which may not contain +an array as a value. +Hashed indexes cannot have a ``unique`` constraint. + +MongoDB can use the hashed index to support equality queries, but cannot +use these indexes for range queries. + +It is possible to create a hashed and non-hashed index on the same field: +MongoDB will use the scalar index for range queries. + +.. _hashed-index-warning: + +.. include:: /includes/warning-hashed-index-floating-point.rst + +Create a hashed index using an operation that resembles the +following: + +.. code-block:: javascript + + db.active.ensureIndex( { a: "hashed" } ) + +This operation creates a hashed index for the ``active`` collection on +the ``a`` field. + +.. [#hash-size] The hash stored in the hashed index is 64 bits. + .. index:: index; options .. _index-creation-operations: .. _index-operations: @@ -563,7 +614,7 @@ construction: .. versionchanged:: 2.4 Before 2.4, a :program:`mongod` instance could only build one background index per database at a time. - + .. versionchanged:: 2.2 Before 2.2, a single :program:`mongod` instance could only build one index at a time. diff --git a/source/core/sharded-cluster-internals.txt b/source/core/sharded-cluster-internals.txt index d7e11e3ba86..169ce39ee71 100644 --- a/source/core/sharded-cluster-internals.txt +++ b/source/core/sharded-cluster-internals.txt @@ -232,7 +232,7 @@ Choosing a Shard Key It is unlikely that any single, naturally occurring key in your collection will satisfy all requirements of a good shard key. There -are three options: +are four options: #. Compute a more ideal shard key in your application layer, and store this in all of your documents, potentially in the @@ -249,6 +249,13 @@ are three options: - expected data size, or - query patterns and demands. +#. .. versionadded:: 2.4 + Utilize a :term:`hashed shard key`. + With a hashed shard key you choose a field that has high variability + and create a hashed index on that field. + MongoDB then uses the values of this hashed index as the shard key + values, thus ensuring an even distribution across the shards. + From a decision making stand point, begin by finding the field that will provide the required :ref:`query isolation `, ensure that :ref:`writes will @@ -308,6 +315,36 @@ and you want to replace this with an index on the field ``{ zipcode: If you drop the last appropriate index for the shard key, recover by recreating a index on just the shard key. +.. _sharding-hashed-shard-key-internals: + +Hashed Shard Keys +~~~~~~~~~~~~~~~~~ + +.. versionadded:: 2.4 + + Hashed shard keys + use a special :ref:`hashed index type ` that stores + hashes of the shard key field to partition data in a cluster. + + Use hashed shard keys when the desired shard key has high + cardinality but uneven distribution, or increases monotonically. + +.. example:: + + The :term:`ObjectId` value of the default ``_id`` field in MongoDB + documents has good cardinality but can lead to a hot shard as new + documents are always inserted on the last shard. + + A hashed index on an :term:`ObjectId` will lead to an even distribution + of documents across all shards as the hash of two sequential documents + will be consistently different. + +Do not use tag aware sharding with hashed shard keys. +Tags are applied to the hashed field value in the index, and not +the underlying field value used to compute the hash. + +.. include:: /includes/warning-hashed-index-floating-point.rst + .. index:: balancing; internals .. _sharding-balancing-internals: diff --git a/source/core/sharded-clusters.txt b/source/core/sharded-clusters.txt index b96c57e905e..ff87d6511c8 100644 --- a/source/core/sharded-clusters.txt +++ b/source/core/sharded-clusters.txt @@ -85,6 +85,52 @@ the optimal key. In those situations, computing a special purpose shard key into an additional field or using a compound shard key may help produce one that is more ideal. +.. _sharding-hashed-sharding: + +Hashed Sharding +--------------- + +.. versionadded:: 2.4 + +:ref:`Hashed shard keys ` use a :ref:`hashed index ` of +the chosen field as the value in the index used to partition data +across your sharded cluster. + +.. example:: + + To shard a collection using a hashed shard key, issue an operation in + the :program:`mongo` shell that resembles the following: + + .. code-block:: javascript + + sh.shardCollection( "records.active", { a: "hashed" } ) + + This operation shards the ``active`` collection in the ``records`` + database, using a hash of the ``a`` field as the shard key. + +The field you choose as your hashed shard key should have a distribution +of values. + +A field with a high degree of cardinality, or with ever increasing values +would be an ideal choice as a hashed shard key. + +If you shard an empty collection using a hashed shard key, +MongoDB will automatically create and migrate chunks so that +each shard has two chunks. +You can control how many chunks MongoDB will create with the +``numInitialChunks`` parameter to :dbcommand:`shardCollection`. + +See :ref:`index-hashed-index` for limitations on hashed indexes. + +.. include:: /includes/warning-hashed-index-floating-point.rst + +.. warning:: + + Hashed shard keys are only supported by MongoDB 2.4 and greater + versions of the :program:`mongos` program. After sharding a + collection with a hashed shard key you must use MongoDB 2.4 or + greater mongos instances in your sharded cluster. + .. index:: balancing .. _sharding-balancing: diff --git a/source/faq/sharding.txt b/source/faq/sharding.txt index 25d65c6d3a6..aebe143ecf7 100644 --- a/source/faq/sharding.txt +++ b/source/faq/sharding.txt @@ -369,10 +369,13 @@ performance. However, if you have a high insert volume, a monotonically increasing shard key may be a limitation. To address this issue, you can use a field with a value that stores -the hash of a key with an ascending value. While you can compute a -hashed value in your application and include this value in your -documents for use as a shard key, the :issue:`SERVER-2001` issue will -implement this capability within MongoDB. +the hash of a key with an ascending value. + +.. versionchanged:: 2.4 + You can use a :ref:`hashed index ` and + :term:`hashed shard key` + or you can compute and maintain this hashed value in your + application. What do ``moveChunk commit failed`` errors mean? ------------------------------------------------ diff --git a/source/includes/warning-cannot-unshard-collection.rst b/source/includes/warning-cannot-unshard-collection.rst new file mode 100644 index 00000000000..59a218ec0dd --- /dev/null +++ b/source/includes/warning-cannot-unshard-collection.rst @@ -0,0 +1,8 @@ +.. warning:: + + There is no supported means to un-shard a collection after running + :dbcommand:`shardCollection`. + Additionally, once you have sharded a collection you cannot + change shard keys, or update the value of any field used in + your shard key index. + diff --git a/source/includes/warning-hashed-index-floating-point.rst b/source/includes/warning-hashed-index-floating-point.rst new file mode 100644 index 00000000000..1ee6d256294 --- /dev/null +++ b/source/includes/warning-hashed-index-floating-point.rst @@ -0,0 +1,9 @@ +.. warning:: + + Hashed indexes truncate floating point numbers to 64-bit integers + before hashing. For example, a hashed index would store the same + value for a field that held a value of ``2.3``, ``2.2`` and ``2.9``. + To prevent collisions do not use a hashed index for floating point + numbers that cannot be consistently converted to 64-bit integers (and + then back to floating point.) Hashed indexes do not support floating + point values larger than 2\ :sup:`53`. diff --git a/source/reference/command/shardCollection.txt b/source/reference/command/shardCollection.txt index 5f63acca806..ec04918d219 100644 --- a/source/reference/command/shardCollection.txt +++ b/source/reference/command/shardCollection.txt @@ -21,17 +21,38 @@ shardCollection shard. ```` is a document, and takes the same form as an :ref:`index specification document `. + :param string shardCollection: + + Specify the namespace of a collection to be sharded in the form + ``.``. + + :param document key: + + Specify the index specification to use as the shard key. The + index must exist prior to the :dbcommand:`shardCollection` command + unless the collection is empty. If the collection is empty, then + MongoDB will create the index prior to sharding the collection. + + .. versionadded:: 2.4 + The key may be in the form ``{ field : "hashed" }`` which will + use the specified field as a hashed shard key. + + :param integer numInitialChunks: + + .. versionadded:: 2.4 + Specify the number of chunks to create upon sharding the + collection. The collection will then be pre-split and balanced + across the specified number of chunks. + + You can create at most ``8192`` chunks using ``numInitialChunks``. + Choosing the right shard key to effectively distribute load among - your shards requires some planning. + your shards requires some planning. Also review + :ref:`sharding-shard-key` regarding choosing a shard key. - .. seealso:: :doc:`/sharding` for more information related to - sharding. Also consider the section on :ref:`sharding-shard-key` - for documentation regarding shard keys. + .. include:: /includes/warning-cannot-unshard-collection.rst - .. warning:: + .. seealso:: - There's no easy way to disable sharding after running :dbcommand:`shardCollection`. In addition, - you cannot change shard keys once set. If you must convert a sharded cluster to a :term:`standalone` - node or :term:`replica set`, you must make a single backup of the entire cluster - and then restore the backup to the standalone :program:`mongod` - or the replica set.. + :doc:`/sharding`, :doc:`/core/sharded-clusters`, and + :doc:`/tutorial/deploy-shard-cluster`. diff --git a/source/reference/glossary.txt b/source/reference/glossary.txt index 309e4ee9d39..a6bc762fe4f 100644 --- a/source/reference/glossary.txt +++ b/source/reference/glossary.txt @@ -512,6 +512,12 @@ Glossary uses to distribute documents among members of the :term:`sharded cluster`. + hashed shard key + A :ref:`hashed shard key ` + is a special type of :term:`shard key` where + a hash of the shard key field is uses to distribute + documents among members of the :term:`sharded cluster`. + query A read request. MongoDB queries use a :term:`JSON`-like query language that includes a variety of :term:`query operators ` diff --git a/source/reference/method/db.collection.ensureIndex.txt b/source/reference/method/db.collection.ensureIndex.txt index 68355e93549..3975e889e49 100644 --- a/source/reference/method/db.collection.ensureIndex.txt +++ b/source/reference/method/db.collection.ensureIndex.txt @@ -14,12 +14,16 @@ db.collection.ensureIndex() fields to index and order of the index. A ``1`` specifies ascending and a ``-1`` specifies descending. + A value of ``"hashed"`` can be used to create an + index on hashed values of the specified field. + Hashed indexes are primarily used to support + :term:`hashed shard keys`. :param document options: A :term:`document` that controls the creation of the index. This argument is optional. .. warning:: Index names, including their full namespace - (i.e. ``database.collection``) can be no longer than 128 + (i.e. ``database.collection``) cannot be longer than 128 characters. See the :method:`db.collection.getIndexes()` field ":data:`~system.indexes.name`" for the names of existing indexes. @@ -33,7 +37,7 @@ db.collection.ensureIndex() ``[key]``. If the ``keys`` document specifies more than one field, than - :method:`db.collection.ensureIndex()` creates a :term:`compound + :method:`~db.collection.ensureIndex()` creates a :term:`compound index`. To specify a compound index use the following form: .. code-block:: javascript @@ -43,6 +47,8 @@ db.collection.ensureIndex() This command creates a compound index on the ``key`` field (in ascending order) and ``key1`` field (in descending order.) + A compound index cannot include a :ref:`hashed index `. + .. note:: The order of an index is important for supporting @@ -125,7 +131,7 @@ db.collection.ensureIndex() and faster index format. Please be aware of the following behaviors of - :method:`ensureIndex() `: + :method:`~db.collection.ensureIndex()`: - To add or change index options you must drop the index using the :method:`db.collection.dropIndex()` and issue another diff --git a/source/reference/method/sh.shardCollection.txt b/source/reference/method/sh.shardCollection.txt index 29ebef38c17..0fff98dcdfe 100644 --- a/source/reference/method/sh.shardCollection.txt +++ b/source/reference/method/sh.shardCollection.txt @@ -18,9 +18,27 @@ sh.shardCollection() uniqueness so long as the unique index is a prefix of the shard key. + You cannot create a unique constraint when + using a :term:`hashed shard key`. + Shards the named collection, according to the specified :term:`shard key`. Specify shard keys in the form of a :term:`document`. Shard keys may refer to a single document field, or more typically several document fields to form a "compound shard key." - .. see:: :limit:`Size of Sharded Collection` + .. versionadded:: 2.4 + Use the form ``{field: "hashed"}`` to create a + :term:`hashed shard key `. + Only one field may be used with a hashed shard key. + + .. include:: /includes/warning-cannot-unshard-collection.rst + + .. seealso:: + + :dbcommand:`shardCollection` for additional options, + :doc:`/sharding`, :doc:`/core/sharded-clusters` for an overview of + sharding with MongoDB and + :doc:`/tutorial/deploy-shard-cluster` for a tutorial. + Also review :ref:`sharding-shard-key` regarding choosing a shard + key. + diff --git a/source/release-notes/2.4.txt b/source/release-notes/2.4.txt index f62fc606a80..bb239933f6f 100644 --- a/source/release-notes/2.4.txt +++ b/source/release-notes/2.4.txt @@ -6,7 +6,7 @@ Release Notes for MongoDB 2.4 (2.3 Development Series) MongoDB 2.4 is currently in development, as part of the 2.3 development release series. While 2.3-series releases are currently -available, these versions of MongoDB, including the 2.4 release +available, these versions of MongoDB, including the 2.4 release candidate builds, are for *testing only and not for production use*. @@ -658,111 +658,14 @@ New Hashed Index and Sharding with a Hashed Shard Key ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To support an easy to configure and evenly distributed shard key, version 2.3 adds a -new "``hashed``" index type that indexes based on hashed values. This -section introduces and documents both the new index type and its use -in sharding: - -Hashed Index -```````````` - -The new ``hashed`` index exists primarily to support automatically -hashed shard keys. Consider the following properties of hashed -indexes: - -- Hashed indexes must only have a single field, and cannot be compound - indexes. - -- Fields indexed with hashed indexes must *not* hold arrays. Hashed - indexes cannot be multikey indexes. - -- Hashed indexes cannot have a ``unique`` constraint. - - You *may* create hashed indexes with the ``sparse`` property. - -- MongoDB can use the hashed index to support equality queries, but - cannot use these indexes for range queries. - -- Hashed indexes offer no performance advantage over normal indexes. - *However*, hashed indexes may be smaller than a normal index when - the values of the indexed field are larger than 64 bits. [#hash-size]_ - -- it's possible to have a hashed and non-hashed index on the same - field: MongoDB will use the non-hashed for range queries. - -.. _hashed-index-warning: - -.. warning:: - - Hashed indexes round floating point numbers to 64-bit integers - before hashing. For example, a hashed index would store the same - value for a field that held a value of ``2.3`` and ``2.2``. To - prevent collisions do not use a hashed index for floating point - numbers that cannot be consistently converted to 64-bit integers (and - then back to floating point.) Hashed indexes do not support - floating point values larger than 2\ :sup:`53`. - -Create a hashed index using an operation that resembles the -following: - -.. code-block:: javascript - - db.active.ensureIndex( { a: "hashed" } ) - -This operation creates a hashed index for the ``active`` collection on -the ``a`` field. - -.. [#hash-size] The hash stored in the hashed index is 64 bits long. - -Hashed Sharding -``````````````` - -To shard a collection using a hashed shard key, issue an operation in -the :program:`mongo` shell that resembles the following: - -.. code-block:: javascript - - sh.shardCollection( "records.active", { a: "hashed" } ) - -This operation shards the ``active`` collection in the ``records`` -database, using a hash of the ``a`` field as the shard -key. Consider the following properties when using a hashed shard key: - -- As with other kinds of shard key indexes, if your collection has - data, you must create the hashed index before sharding. If your - collection does not have data, sharding the collection will create - the appropriate index. - -- The :program:`mongos` will route all equality queries to a specific - shard or set of shards; however, the :program:`mongos` must route - range queries to all shards. - -- When using a hashed shard key on a new collection, MongoDB - automatically pre-splits the range of 64-bit hash values into - chunks. By default, the initial number of chunks is equal to twice - the number of shards at creation time. You can change the number of - chunks created, using the ``numInitialChunks`` option, as in the - following invocation of :dbcommand:`shardCollection`: - - .. code-block:: javascript - - db.adminCommand( { shardCollection: "test.collection", - key: { a: "hashed"}, - numInitialChunks: 2001 } ) - - MongoDB will only pre-split chunks in a collection when sharding - empty collections. MongoDB will not create chunk splits in a - collection sharding collections that have data. - - .. note:: - - ``numInititalChanks`` allows you to create, at most ``8192`` - chunks when sharding a collection. - -.. warning:: - - Avoid using hashed shard keys when the hashed field has non-integral floating - point values, see :ref:`hashed indexes ` for - more information. +new "``hashed``" index type +that uses the hash of the specified field or shard key as the index entry. +Hashed indexes are detailed in :ref:`index-type-hashed`. + +Hashed shard keys use a hashed index on the shard key field +to partition data and balance a collection's data across a +sharded cluster. +Hashed sharding is detailed in :ref:`sharding-hashed-sharding`. Security Improvements +++++++++++++++++++++ diff --git a/source/tutorial/deploy-shard-cluster.txt b/source/tutorial/deploy-shard-cluster.txt index 687c5458831..2303a91afe9 100644 --- a/source/tutorial/deploy-shard-cluster.txt +++ b/source/tutorial/deploy-shard-cluster.txt @@ -13,7 +13,7 @@ sharded cluster for the first time, consider the :doc:`/administration/sharded-cluster-architectures` documents. To set up a sharded cluster, complete the following sequence of tasks -in the order defined below: +in the order defined below: #. :ref:`sharding-setup-start-cfgsrvr` @@ -225,6 +225,11 @@ You enable sharding on a per-collection basis. of the shard key affects the efficiency of sharding. See the selection considerations listed in the :ref:`sharding-shard-key-selection`. +#. If the collection already contains data you must create an index on + the :term:`shard key` using :method:`~db.collection.ensureIndex()`. + If the collection is empty then MongoDB will create the index as part + of the :method:`sh.shardCollection()` step. + #. Enable sharding for a collection by issuing the :method:`sh.shardCollection()` method in the :program:`mongo` shell. The method uses the following syntax: @@ -240,14 +245,15 @@ You enable sharding on a per-collection basis. specify in the same form as you would an :method:`index ` key pattern. -.. example:: The following sequence of commands shards four collections: + .. example:: The following sequence of commands shards four collections: - .. code-block:: javascript + .. code-block:: javascript - sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) - sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) - sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) - sh.shardCollection("events.alerts", { "hashed_id": 1 } ) + sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) + sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) + sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) + db.alerts.ensureIndex( { _id : "hashed" } ) + sh.shardCollection("events.alerts", { "_id": "hashed" } ) In order, these operations shard: @@ -279,9 +285,13 @@ You enable sharding on a per-collection basis. field. #. The ``alerts`` collection in the ``events`` database using the shard key - ``{ "hashed_id": 1 }``. + ``{ "_id": "hashed" }``. - This shard key distributes documents by the value of the - ``hashed_id`` field. Presumably this is a computed value that - holds the hash of some value in your documents and is able to - evenly distribute documents throughout your cluster. + .. versionadded:: 2.4 + + This shard key distributes documents by a hash of the value + of the ``_id`` field. + MongoDB computes the hash of the ``_id`` field through the + use of a :ref:`hashed index `. + The hashed index should provide an even distribution of documents + throughout your cluster. diff --git a/source/tutorial/enforce-unique-keys-for-sharded-collections.txt b/source/tutorial/enforce-unique-keys-for-sharded-collections.txt index f918a9ca317..7cb26d036ca 100644 --- a/source/tutorial/enforce-unique-keys-for-sharded-collections.txt +++ b/source/tutorial/enforce-unique-keys-for-sharded-collections.txt @@ -23,6 +23,9 @@ collections in a sharded environment, there are two options: *entire* key combination, and not for a specific component of the shard key. + You cannot specify a unique constraint on a + :ref:`hashed index `. + #. Use a secondary collection to enforce uniqueness. Create a minimal collection that only contains the unique field and @@ -37,7 +40,7 @@ collections in a sharded environment, there are two options: collection and you can create multiple unique indexes. Otherwise you can shard on a single unique key. -Always use the default :ref:`acknowledged ` +Always use the default :ref:`acknowledged ` :ref:`write concern ` in conjunction with a :doc:`recent MongoDB driver `.