From 6dc5e7847abd5e8bb80c68ac06c1136d08f35047 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Mon, 15 Oct 2012 17:18:34 -0400 Subject: [PATCH 1/3] early draft --- source/administration/sharding.txt | 65 ++++++++++++++++++++++++++++-- source/reference/glossary.txt | 2 + 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index d59bd083e2b..caae79f9b56 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -200,11 +200,44 @@ Cluster Management Once you have a running sharded cluster, you will need to maintain it. This section describes common maintenance procedure, including: how to -add and remove nodes, how to manually split chunks, and how to disable +add and remove shards, how to manually split chunks, and how to disable the balancer for backups. -List all Shards -~~~~~~~~~~~~~~~ +List Databases with Sharding Enabled +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To list the databases for which sharding is enabled, query the +``databases`` collection in the :ref:`config database +`. The databases with the ``partitioned`` field set to +``true`` have sharding enabled. + +.. example:: Run this sequence of commands to open the :ref:`config + database ` and query the ``databases`` collection. + + .. code-block:: javascript + + use config + + db.databases.find() + + Given the following example output: + + .. code-block:: javascript + + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + + { "_id" : "foo", "partitioned" : true, "primary" : "example1.com:30001" } + + { "_id" : "baz", "partitioned" : false, "primary" : "example2.com:27017" } + + Only the database `foo` has sharding enabled because only `foo` has + the ``partitioned`` field set to ``true``. + +.. note:: If a database in a sharded environment does not have sharding + enabled, all its data is on the first shard of the cluster. + +List Shards +~~~~~~~~~~~ To list the current set of configured shards and verify that all shards have been committed to the system, run the :dbcommand:`listShards` @@ -216,6 +249,32 @@ command: .. _sharding-procedure-add-shard: + + + + + + + + + + + + + + + + + + + + + + + + + + Add a Shard to a Cluster ~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/source/reference/glossary.txt b/source/reference/glossary.txt index 973d6f7c41b..423a6d6aac3 100644 --- a/source/reference/glossary.txt +++ b/source/reference/glossary.txt @@ -881,3 +881,5 @@ Glossary primary shard For a database in which :term:`sharding` is enabled, the primary shard holds all un-sharded collections. + + .. seealso:: :ref:`sharding-sharded-and-unsharded-data` From 2a9e0163446cffb451701bedd494ca03dabc09fd Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Mon, 15 Oct 2012 18:22:57 -0400 Subject: [PATCH 2/3] 302 view shard info procedures --- source/administration/sharding.txt | 69 +++++++++++++++++++----------- 1 file changed, 44 insertions(+), 25 deletions(-) diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index caae79f9b56..a418f813ac5 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -77,7 +77,7 @@ use the following procedure as a quick starting point: mongo mongos0.example.net -#. Add shards to the cluster. +#. Add shards to the cluster. .. note:: In production deployments, all shards should be replica sets. @@ -217,7 +217,6 @@ To list the databases for which sharding is enabled, query the .. code-block:: javascript use config - db.databases.find() Given the following example output: @@ -225,17 +224,12 @@ To list the databases for which sharding is enabled, query the .. code-block:: javascript { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "foo", "partitioned" : true, "primary" : "example1.com:30001" } - { "_id" : "baz", "partitioned" : false, "primary" : "example2.com:27017" } Only the database `foo` has sharding enabled because only `foo` has the ``partitioned`` field set to ``true``. -.. note:: If a database in a sharded environment does not have sharding - enabled, all its data is on the first shard of the cluster. - List Shards ~~~~~~~~~~~ @@ -247,33 +241,58 @@ command: db.runCommand( { listshards : 1 } ) -.. _sharding-procedure-add-shard: - - - - - - - - - - - - - - - - - +View Cluster Details +~~~~~~~~~~~~~~~~~~~~ +To view cluster details, issue :method:`db.printShardingStatus()` or +:method:`sh.status()`. Both methods return the same output. +.. example:: In the following example output from :method:`sh.status()` + - ``sharding version`` displays the version number of the shard + metadata. + - ``shards`` displays each :program:`mongod` instance that is used as + a shard. + - ``databases`` displays all databases living in the cluster, + including databases for which sharding is not enabled. + - The ``chunks`` information for the ``foo`` database displays how + many chunks are on each shard and displays the range of each chunk. + .. code-block:: javascript + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "shard0000", "host" : "example1.com:40000" } + { "_id" : "shard0001", "host" : "example2.com:50000" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "foo", "partitioned" : true, "primary" : "shard0000" } + foo.big chunks: + shard0001 1 + shard0000 6 + { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo + { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo + { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo + { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) + { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + foo.large chunks: + shard0001 1 + shard0000 5 + { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) + { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo + { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) + { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } +.. _sharding-procedure-add-shard: Add a Shard to a Cluster ~~~~~~~~~~~~~~~~~~~~~~~~ From f3423ef6b1b9b4d1a2a188157f3a891ff7546a6e Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 16 Oct 2012 15:29:57 -0400 Subject: [PATCH 3/3] DOCS-302 migrated shard admin page --- .../administration/sharding-architectures.txt | 20 +- source/administration/sharding.txt | 237 +++++++++--------- source/core/sharding-internals.txt | 63 +++-- source/core/sharding.txt | 79 +++--- 4 files changed, 198 insertions(+), 201 deletions(-) diff --git a/source/administration/sharding-architectures.txt b/source/administration/sharding-architectures.txt index 6ae4c81f288..1c9e89e7435 100644 --- a/source/administration/sharding-architectures.txt +++ b/source/administration/sharding-architectures.txt @@ -9,12 +9,10 @@ Sharded Cluster Architectures .. default-domain:: mongodb This document describes the organization and design of :term:`sharded -cluster` deployments. +cluster` deployments. For a list of all sharding +documentation see :doc:`/sharding`. -.. seealso:: The :doc:`/administration/sharding` document, the - ":ref:`Sharding Requirements `" section, - and the ":ref:`Sharding Tutorials `" for more - information on deploying and maintaining a :term:`sharded cluster`. +.. seealso:: :ref:`sharding-requirements`. Deploying a Test Cluster ------------------------ @@ -55,17 +53,17 @@ components: - 2 or more :term:`replica sets `, for the :term:`shards `. - .. see:: ":doc:`/administration/replication-architectures`" - and ":doc:`/replication`" for more information on replica sets. + .. see:: For more information on replica sets see + :doc:`/administration/replication-architectures` and + :doc:`/replication`. - :program:`mongos` instances. Typically, you will deploy a single :program:`mongos` instance on each application server. Alternatively, you may deploy several `mongos` nodes and let your application connect to these via a load balancer. -.. seealso:: The ":ref:`sharding-procedure-add-shard`" and - ":ref:`sharding-procedure-remove-shard`" procedures for more - information. +.. seealso:: :ref:`sharding-procedure-add-shard` and + :ref:`sharding-procedure-remove-shard`. Sharded and Non-Sharded Data ---------------------------- @@ -117,7 +115,7 @@ created subsequently, may reside on any shard in the cluster. High Availability and MongoDB ----------------------------- -A :ref:`production ` +A :ref:`production ` :term:`cluster` has no single point of failure. This section introduces the availability concerns for MongoDB deployments, and highlights potential failure scenarios and available resolutions: diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index a418f813ac5..1aa90c3622a 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -8,14 +8,7 @@ Sharded Cluster Administration .. default-domain:: mongodb This document describes common administrative tasks for sharded -clusters. For a full introduction to sharding in MongoDB see -:doc:`/core/sharding`, and for a complete overview of all sharding -documentation in the MongoDB Manual, see :doc:`/sharding`. The -:doc:`/administration/sharding-architectures` document provides an -overview of deployment possibilities to help deploy a sharded -cluster. Finally, the :doc:`/core/sharding-internals` document -provides a more detailed introduction to sharding when troubleshooting -issues or understanding your cluster's behavior. +clusters. For a list of all sharding documentation see :doc:`/sharding`. .. contents:: Sharding Procedures: :backlinks: none @@ -26,8 +19,7 @@ issues or understanding your cluster's behavior. Set up a Sharded Cluster ------------------------ -Before deploying a cluster, see the requirements listed in -:ref:`Requirements for Sharded Clusters `. +Before deploying a cluster, see :ref:`sharding-requirements`. For testing purposes, you can run all the required shard :program:`mongod` processes on a single server. For production, use the configurations described in @@ -198,99 +190,8 @@ use the following procedure as a quick starting point: Cluster Management ------------------ -Once you have a running sharded cluster, you will need to maintain it. -This section describes common maintenance procedure, including: how to -add and remove shards, how to manually split chunks, and how to disable -the balancer for backups. - -List Databases with Sharding Enabled -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To list the databases for which sharding is enabled, query the -``databases`` collection in the :ref:`config database -`. The databases with the ``partitioned`` field set to -``true`` have sharding enabled. - -.. example:: Run this sequence of commands to open the :ref:`config - database ` and query the ``databases`` collection. - - .. code-block:: javascript - - use config - db.databases.find() - - Given the following example output: - - .. code-block:: javascript - - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "foo", "partitioned" : true, "primary" : "example1.com:30001" } - { "_id" : "baz", "partitioned" : false, "primary" : "example2.com:27017" } - - Only the database `foo` has sharding enabled because only `foo` has - the ``partitioned`` field set to ``true``. - -List Shards -~~~~~~~~~~~ - -To list the current set of configured shards and verify that all shards -have been committed to the system, run the :dbcommand:`listShards` -command: - -.. code-block:: javascript - - db.runCommand( { listshards : 1 } ) - -View Cluster Details -~~~~~~~~~~~~~~~~~~~~ - -To view cluster details, issue :method:`db.printShardingStatus()` or -:method:`sh.status()`. Both methods return the same output. - -.. example:: In the following example output from :method:`sh.status()` - - - ``sharding version`` displays the version number of the shard - metadata. - - - ``shards`` displays each :program:`mongod` instance that is used as - a shard. - - - ``databases`` displays all databases living in the cluster, - including databases for which sharding is not enabled. - - - The ``chunks`` information for the ``foo`` database displays how - many chunks are on each shard and displays the range of each chunk. - - .. code-block:: javascript - - --- Sharding Status --- - sharding version: { "_id" : 1, "version" : 3 } - shards: - { "_id" : "shard0000", "host" : "example1.com:40000" } - { "_id" : "shard0001", "host" : "example2.com:50000" } - databases: - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "foo", "partitioned" : true, "primary" : "shard0000" } - foo.big chunks: - shard0001 1 - shard0000 6 - { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo - { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo - { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo - { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) - { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - foo.large chunks: - shard0001 1 - shard0000 5 - { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) - { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo - { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) - { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } +This section describes how to add and remove shards and how to view +information about your :term:`sharded cluster`. .. _sharding-procedure-add-shard: @@ -359,9 +260,8 @@ procedure: It may take some time for :term:`chunks ` to migrate to the new shard. - See the :ref:`Balancing and Distribution ` - section for an overview of the balancing operation and the :ref:`Balancing Internals - ` section for additional information. + For an introduction to balancing, see :ref:`sharding-balancing`. For + lower level information on balancing, see :ref:`sharding-balancing-internals`. .. _sharding-procedure-remove-shard: @@ -483,6 +383,95 @@ The procedure to remove a shard is as follows: Once the value of the ``stage`` field is "completed," you may safely stop the processes comprising the ``mongodb0`` shard. +List Databases with Sharding Enabled +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To list the databases for which sharding is enabled, query the +``databases`` collection in the :ref:`config-database`. +The databases with the ``partitioned`` field set to +``true`` have sharding enabled. + +.. example:: Run this sequence of commands to open the :ref:`config + database ` and query the ``databases`` collection. + + .. code-block:: javascript + + use config + db.databases.find() + + Given the following example output: + + .. code-block:: javascript + + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "foo", "partitioned" : true, "primary" : "example1.com:30001" } + { "_id" : "baz", "partitioned" : false, "primary" : "example2.com:27017" } + + Only the database `foo` has sharding enabled because only `foo` has + the ``partitioned`` field set to ``true``. + +List Shards +~~~~~~~~~~~ + +To list the current set of configured shards and verify that all shards +have been committed to the system, run the :dbcommand:`listShards` +command: + +.. code-block:: javascript + + db.runCommand( { listshards : 1 } ) + +View Cluster Details +~~~~~~~~~~~~~~~~~~~~ + +To view cluster details, issue :method:`db.printShardingStatus()` or +:method:`sh.status()`. Both methods return the same output. + +.. example:: In the following example output from :method:`sh.status()` + + - ``sharding version`` displays the version number of the shard + metadata. + + - ``shards`` displays each :program:`mongod` instance that is used as + a shard. + + - ``databases`` displays all databases living in the cluster, + including databases for which sharding is not enabled. + + - The ``chunks`` information for the ``foo`` database displays how + many chunks are on each shard and displays the range of each chunk. + + .. code-block:: javascript + + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "shard0000", "host" : "example1.com:40000" } + { "_id" : "shard0001", "host" : "example2.com:50000" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "foo", "partitioned" : true, "primary" : "shard0000" } + foo.big chunks: + shard0001 1 + shard0000 6 + { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo + { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo + { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo + { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) + { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + foo.large chunks: + shard0001 1 + shard0000 5 + { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) + { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo + { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) + { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } + Chunk Management ---------------- @@ -635,7 +624,9 @@ To create and migrate chunks manually, use the following procedure: } You can also let the balancer automatically distribute the new - chunks. + chunks. For an introduction to balancing, see + :ref:`sharding-balancing`. For lower level information on balancing, + see :ref:`sharding-balancing-internals`. .. _sharding-balancing-modify-chunk-size: @@ -654,7 +645,7 @@ To modify the chunk size, use the following procedure: #. Connect to any :program:`mongos` in the cluster using the :program:`mongo` shell. -#. Issue the following command to switch to the config database: +#. Issue the following command to switch to the :ref:`config-database`: .. code-block:: javascript @@ -710,9 +701,9 @@ However, you may want to migrate chunks manually in a few cases: - If the balancer in an active cluster cannot distribute chunks within the balancing window, then you will have to migrate chunks manually. -See the :ref:`chunk migration ` section to -understand the internal process of how chunks move -between shards. +For more information on how chunks move between shards, see +:ref:`sharding-balancing-internals`, in particular the section +:ref:`sharding-chunk-migration`. To migrate chunks, use the :dbcommand:`moveChunk` command. @@ -773,12 +764,10 @@ to pre-splitting. Balancer Operations ------------------- -This section provides an overview of common administrative procedures -related to balancing and the balancing process. - -.. seealso:: :ref:`sharding-balancing` and the - :dbcommand:`moveChunk` that provides manual :term:`chunk` - migrations. +This section describes provides common administrative procedures related +to balancing. For an introduction to balancing, see +:ref:`sharding-balancing`. For lower level information on balancing, see +:ref:`sharding-balancing-internals`. .. _sharding-balancing-check-lock: @@ -791,7 +780,7 @@ To see if the balancer process is active in your :term:`cluster #. Connect to any :program:`mongos` in the cluster using the :program:`mongo` shell. -#. Issue the following command to switch to the config database: +#. Issue the following command to switch to the :ref:`config-database`: .. code-block:: javascript @@ -856,7 +845,7 @@ be able to migrate chunks: #. Connect to any :program:`mongos` in the cluster using the :program:`mongo` shell. -#. Issue the following command to switch to the config database: +#. Issue the following command to switch to the :ref:`config-database`: .. code-block:: javascript @@ -942,7 +931,7 @@ run this operation from a driver that does not have helper functions: #. Connect to any :program:`mongos` in the cluster using the :program:`mongo` shell. -#. Issue the following command to switch to the config database: +#. Issue the following command to switch to the :ref:`config-database`: .. code-block:: javascript @@ -1000,7 +989,7 @@ three config servers. #. Copy the entire :setting:`dbpath` file system tree from the existing config server to the two machines that will provide the additional config servers. These commands, issued on the system - with the existing config database, ``mongo-config0.example.net`` may + with the existing :ref:`config-database`, ``mongo-config0.example.net`` may resemble the following: .. code-block:: sh @@ -1062,7 +1051,7 @@ as needed. Migrate Config Servers with Different Hostnames ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Use this process when you need to migrate a config database to a new +Use this process when you need to migrate a :ref:`config-database` to a new server and it *will not* be accessible via the same hostname. If possible, avoid changing the hostname so that you can use the :ref:`previous procedure `. @@ -1088,7 +1077,7 @@ possible, avoid changing the hostname so that you can use the that provide your shards. - the :program:`mongod` instances that provide your existing - config databases. + :ref:`config databases `. - all :program:`mongos` instances in your cluster. @@ -1139,7 +1128,7 @@ Backup Cluster Metadata ~~~~~~~~~~~~~~~~~~~~~~~ The cluster will remain operational [#read-only]_ without one -of the config databases :program:`mongod` instances, creating a backup +of the config database's :program:`mongod` instances, creating a backup of the cluster metadata from the config database is straight forward: #. Shut down one of the :term:`config databases `. diff --git a/source/core/sharding-internals.txt b/source/core/sharding-internals.txt index 5d93486f127..b1b6f6c0bbb 100644 --- a/source/core/sharding-internals.txt +++ b/source/core/sharding-internals.txt @@ -7,12 +7,12 @@ Sharding Internals .. default-domain:: mongodb -This document introduces lower level sharding concepts for users who -are familiar with :term:`sharding` generally and want to learn more -about the internals of sharding in MongoDB. The ":doc:`/core/sharding`" -document provides an overview of higher level sharding concepts while -the ":doc:`/administration/sharding`" provides an overview of common -administrative tasks. +This document introduces lower level sharding concepts for users who are +familiar with :term:`sharding` generally and want to learn more about +the internals. This document provides a more detailed understanding of +your cluster's behavior. For higher level sharding concepts, see +:doc:`/core/sharding`. For a list of all sharding +documentation see :doc:`/sharding`. .. index:: shard key; internals .. _sharding-internals-shard-keys: @@ -129,8 +129,8 @@ succeeds in making all querying operational in sharded environments, the :term:`shard key` you select can have a profound affect on query performance. -.. seealso:: The ":ref:`mongos and Sharding `" and - ":ref:`config server `" sections for a more +.. seealso:: The :ref:`mongos and Sharding ` and + :ref:`config server ` sections for a more general overview of querying in sharded environments. .. index:: shard key; query isolation @@ -168,13 +168,13 @@ selective) you should add a second field to the shard key making a compound shard key. The data may become more splitable with a compound shard key. -.. see:: ":ref:`sharding-mongos`" for more information on query +.. see:: :ref:`sharding-mongos` for more information on query operations in the context of sharded clusters. .. [#shard-key-index] In many ways, you can think of the shard key a cluster-wide unique index. However, be aware that sharded systems cannot enforce cluster-wide unique indexes *unless* the unique - field is in the shard key. Consider the ":wiki:`Indexes`" wiki page + field is in the shard key. Consider the :wiki:`Indexes` wiki page for more information on indexes and compound indexes. Sorting @@ -311,24 +311,24 @@ and you want to replace this with an index on the field ``{ zipcode: Cluster Balancer ---------------- +This section contains complete documentation of the balancer process +and operations. For a higher level introduction see +the :ref:`sharding-balancing` section. + The :ref:`balancer ` sub-process is responsible for redistributing chunks evenly among the shards and ensuring that each member of the cluster is responsible for the same volume of data. -This section contains complete documentation of the balancer process -and operations. For a higher level introduction see -the :ref:`Balancing ` section. - .. _sharding-internals-balancing: Balancing Internals ~~~~~~~~~~~~~~~~~~~ A balancing round originates from an arbitrary :program:`mongos` -instance, because a cluster can have a number of +instance from among the cluster's :program:`mongos` instances. When a balancer process is active, the responsible :program:`mongos` acquires a "lock" by modifying a -document on the :term:`config database`. +document in the ``lock`` collection in the :ref:`config-database`. By default, the balancer process is always running. When the number of chunks in a collection is unevenly distributed among the shards, the @@ -460,7 +460,7 @@ command. This will prevent the :term:`balancer` from migrating chunks to the shard when the value of :status:`mem.mapped` exceeds the ``maxSize`` setting. -.. seealso:: ":doc:`/administration/monitoring`." +.. seealso:: :doc:`/administration/monitoring`. .. _sharding-chunk-migration: @@ -508,7 +508,7 @@ When the ``_secondaryThrottle`` is ``true`` for :dbcommand:`moveChunk` or the :term:`balancer`, MongoDB ensure that *one* :term:`secondary` member has replicated changes before allowing new chunk migrations. -.. _config-database: +.. _config-database: .. _sharding-internals-config-database: Config Database @@ -531,11 +531,11 @@ To start the ``config`` database, issue the following command from the The ``config`` database holds the following collections that support sharded cluster operations: -.. data:: chunks +.. data:: chunks The :data:`chunks` collection stores a document for each chunk in in the cluster. Consider the following example of a document for a - chunk named ``records.pets-animal_\"cat\"``: + chunk named ``records.pets-animal_\"cat\"``: .. code-block:: javascript @@ -606,7 +606,26 @@ sharded cluster operations: The :data:`locks` collection stores a distributed lock. This ensures that only one :program:`mongos` instance can perform - administrative tasks on the culster at once. + administrative tasks on the cluster at once. The :program:`mongos` + acting as :term:`balancer` takes a "lock" by inserting a document + resembling the following into the ``locks`` collection. + + .. code-block:: javascript + + { + "_id" : "balancer", + "process" : "example.net:40000:1350402818:16807", + "state" : 2, + "ts" : ObjectId("507daeedf40e1879df62e5f3"), + "when" : ISODate("2012-10-16T19:01:01.593Z"), + "who" : "example.net:40000:1350402818:16807:Balancer:282475249", + "why" : "doing balance round" + } + + If a lock is taken, the ``state`` field has a value of ``2`` (or a + value of ``1`` in versions prior to 2.0). This means the balancer is + active. The ``when`` field indicates when the balancer began the + current operation. .. data:: mongos @@ -645,7 +664,7 @@ sharded cluster operations: The :data:`shards` collection represents each shard in the cluster in a separate document. If the shard is a replica set, the ``host`` field displays the name of the replica set, then a slash, then - the hostname, as in the following example: + the hostname, as in the following example: .. code-block:: javascript diff --git a/source/core/sharding.txt b/source/core/sharding.txt index b2b027cb782..e6f10a5e1be 100644 --- a/source/core/sharding.txt +++ b/source/core/sharding.txt @@ -7,30 +7,19 @@ Sharding Fundamentals .. default-domain:: mongodb -MongoDB's sharding system allows users to :term:`partition` the data of a -:term:`collection` within a database to distribute documents -across a number of :program:`mongod` instances or :term:`shards `. - -Sharded clusters allow increases in write capacity, provide the ability to -support larger working sets, and raise the limits of total data size beyond -the physical resources of a single node. - This document provides an overview of the fundamental concepts and -operation of sharding with MongoDB. - -.. seealso:: The ":doc:`/sharding`" index for a list of all documents - in this manual that contain information related to the operation - and use of sharded clusters in MongoDB. This includes: +operations of sharding with MongoDB. For a list of all sharding +documentation see :doc:`/sharding`. - - :doc:`/core/sharding-internals` - - :doc:`/administration/sharding` - - :doc:`/administration/sharding-architectures` +MongoDB's sharding system allows users to :term:`partition` a +:term:`collection` within a database to distribute the collection's documents +across a number of :program:`mongod` instances or :term:`shards `. +Sharding increases write capacity, provides the ability to +support larger working sets, and raises the limits of total data size beyond +the physical resources of a single node. - If you are not yet familiar with sharding, see the :doc:`Sharding - FAQ ` document. - -Overview --------- +Sharding Overview +----------------- Features ~~~~~~~~ @@ -89,11 +78,11 @@ A typical :term:`sharded cluster` consists of: ` instances. The :program:`mongos` process routes operations to the correct shard based the cluster configuration. -Indications -~~~~~~~~~~~ +When to Use Sharding +~~~~~~~~~~~~~~~~~~~~ While sharding is a powerful and compelling feature, it comes with -significant :ref:`infrastructure requirements ` +significant :ref:`sharding-requirements-infrastructure` and some limited complexity costs. As a result, use sharding only as necessary, and when indicated by actual operational requirements. Consider the following overview of indications it may be @@ -133,13 +122,13 @@ the corresponding shard keys. .. index:: sharding; requirements .. _sharding-requirements: -Requirements ------------- +Sharding Requirements +--------------------- .. _sharding-requirements-infrastructure: -Infrastructure -~~~~~~~~~~~~~~ +Infrastructure Requirements +~~~~~~~~~~~~~~~~~~~~~~~~~~~ A :term:`sharded cluster` has the following components: @@ -190,8 +179,8 @@ A :term:`sharded cluster` has the following components: the :program:`mongos` instances, causing that :program:`mongos` to require more system resources. -Data -~~~~ +Data Requirements +~~~~~~~~~~~~~~~~~ Your cluster must manage a significant quantity of data for sharding to have an effect on your collection. The default :term:`chunk` size @@ -215,7 +204,7 @@ for your persistence layer needs. .. [#chunk-size] :term:`chunk` size is :option:`user configurable `. However, the default value is of 64 megabytes is ideal for most deployments. See the - ":ref:`sharding-chunk-size`" section in the + :ref:`sharding-chunk-size` section in the :doc:`sharding-internals` document for more information. .. index:: sharding; localhost @@ -266,21 +255,21 @@ The ideal shard key: - is easily divisible which makes it easy for MongoDB to distribute content among the shards. Shard keys that have a limited number of possible values are not ideal as they can result in some chunks that - are "unsplitable." See the ":ref:`sharding-shard-key-cardinality`" + are "unsplitable." See the :ref:`sharding-shard-key-cardinality` section for more information. - will distribute write operations among the cluster, to prevent any single shard from becoming a bottleneck. Shard keys that have a high correlation with insert time are poor choices for this reason; however, shard keys that have higher "randomness" satisfy this - requirement better. See the ":ref:`sharding-shard-key-write-scaling`" + requirement better. See the :ref:`sharding-shard-key-write-scaling` section for additional background. - will make it possible for the :program:`mongos` to return most query operations directly from a single *specific* :program:`mongod` instance. Your shard key should be the primary field used by your queries, and fields with a high degree of "randomness" are poor - choices for this reason. See the ":ref:`sharding-shard-key-query-isolation`" + choices for this reason. See the :ref:`sharding-shard-key-query-isolation` section for specific examples. The challenge when selecting a shard key is that there is not always @@ -372,8 +361,8 @@ that the config servers remain available and intact are critical. :program:`mongos` and Querying ------------------------------ -.. seealso:: ":doc:`/reference/mongos`" and the :program:`mongos`\-only - settings: ":setting:`test`" and :setting:`chunkSize`. +.. seealso:: :doc:`/reference/mongos` and the :program:`mongos`\-only + settings: :setting:`test` and :setting:`chunkSize`. Operations ~~~~~~~~~~ @@ -509,26 +498,28 @@ To route a query to a :term:`cluster `, Balancing and Distribution -------------------------- -Balancing refers to the process that MongoDB uses to redistribute data -within a :term:`sharded cluster` when some :term:`shards ` have a -greater number of :term:`chunks ` than other shards. The +Balancing is the process MongoDB uses to redistribute data within a +:term:`sharded cluster`. When a :term:`shard` has a too many: +term:`chunks ` when compared to other shards, MongoDB balances +the shards. + +The balancing process attempts to minimize the impact that balancing can have on the cluster, by: -- Only moving one chunk at a time. +- Moving only one chunk at a time. - Only initiating a balancing round when the difference in number of chunks between the shard with the greatest and the shard with the least number of chunks exceeds the :ref:`migration threshold `. -Additionally, you may disable the balancer on a temporary basis for +You may disable the balancer on a temporary basis for maintenance and limit the window during which it runs to prevent the balancing process from impacting production traffic. -.. seealso:: The ":ref:`"Balancing Internals - `" and :ref:`Balancer Operations - ` for more information on balancing. +.. seealso:: :ref:`sharding-balancing-operations` and + :ref:`sharding-balancing-internals`. .. note::