diff --git a/source/administration/sharded-clusters.txt b/source/administration/sharded-clusters.txt new file mode 100644 index 00000000000..69840e5aa15 --- /dev/null +++ b/source/administration/sharded-clusters.txt @@ -0,0 +1,282 @@ +.. index:: sharded clusters +.. _sharding-sharded-cluster: + +=============================== +Components in a Sharded Cluster +=============================== + +.. default-domain:: mongodb + +Sharding occurs within a :term:`sharded cluster`. A sharded cluster +consists of the following components: + +- :ref:`Shards `. Each shard is a separate + :program:`mongod` instance or :term:`replica set` that holds a portion + of the your database collections. + +- :ref:`Config servers `. Each config server is + a :program:`mongod` instances that holds metadata about the cluster. + The metadata maps :term:`chunks ` to shards. + +- :ref:`mongos instances `. The :program:`mongos` + instances route the reads and writes to the shards. + +.. seealso:: + + - For specific configurations, see :ref:`sharding-architecture`. + + - To set up sharded clusters, see :ref:`sharding-procedure-setup`. + +.. index:: sharding; shards +.. index:: shards +.. _sharding-shards: + +Shards +------ + +A shard is a container that holds a subset of a collection’s data. Each +shard is either a single :program:`mongod` instance or a :term:`replica +set`. In production, all shards should be replica sets. + +Applications do not access the shards directly. Instead, the +:ref:`mongos instances ` routes reads and writes from +applications to the shards. + +.. index:: sharding; config servers +.. index:: config servers +.. _sharding-config-server: + +Config Servers +-------------- + +Config servers maintain the shard metadata in a config database. The +:term:`config database` stores the relationship between :term:`chunks +` and where they reside within a :term:`sharded cluster`. Without +a config database, the :program:`mongos` instances would be unable to +route queries or write operations within the cluster. + +Config servers *do not* run as replica sets. Instead, a :term:`cluster +` operates with a group of *three* config servers that use a +two-phase commit process that ensures immediate consistency and +reliability. + +For testing purposes you may deploy a cluster with a single +config server, but this is not recommended for production. + +.. warning:: + + If your cluster has a single config server, this + :program:`mongod` is a single point of failure. If the instance is + inaccessible the cluster is not accessible. If you cannot recover + the data on a config server, the cluster will be inoperable. + + **Always** use three config servers for production deployments. + +The actual load on configuration servers is small because each +:program:`mongos` instances maintains a cached copy of the configuration +database. MongoDB only writes data to the config server to: + +- create splits in existing chunks, which happens as data in + existing chunks exceeds the maximum chunk size. + +- migrate a chunk between shards. + +Additionally, all config servers must be available on initial setup +of a sharded cluster, each :program:`mongos` instance must be able +to write to the ``config.version`` collection. + +If one or two configuration instances become unavailable, the +cluster's metadata becomes *read only*. It is still possible to read +and write data from the shards, but no chunk migrations or splits will +occur until all three servers are accessible. At the same time, config +server data is only read in the following situations: + +- A new :program:`mongos` starts for the first time, or an existing + :program:`mongos` restarts. + +- After a chunk migration, the :program:`mongos` instances update + themselves with the new cluster metadata. + +If all three config servers are inaccessible, you can continue to use +the cluster as long as you don't restart the :program:`mongos` +instances until after config servers are accessible again. If you +restart the :program:`mongos` instances and there are no accessible +config servers, the :program:`mongos` would be unable to direct +queries or write operations to the cluster. + +Because the configuration data is small relative to the amount of data +stored in a cluster, the amount of activity is relatively low, and 100% +up time is not required for a functioning sharded cluster. As a result, +backing up the config servers is not difficult. Backups of config +servers are critical as clusters become totally inoperable when +you lose all configuration instances and data. Precautions to ensure +that the config servers remain available and intact are critical. + +.. note:: + + Configuration servers store metadata for a single sharded cluster. + You must have a separate configuration server or servers for each + cluster you administer. + +.. index:: mongos +.. _sharding-mongos: +.. _sharding-read-operations: + +Mongos Instances +---------------- + +The :program:`mongos` provides a single unified interface to a sharded +cluster for applications using MongoDB. Except for the selection of a +:term:`shard key`, application developers and administrators need not +consider any of the :ref:`internal details of sharding `. + +:program:`mongos` caches data from the :ref:`config server +`, and uses this to route operations from +applications and clients to the :program:`mongod` instances. +:program:`mongos` have no *persistent* state and consume +minimal system resources. + +The most common practice is to run :program:`mongos` instances on the +same systems as your application servers, but you can maintain +:program:`mongos` instances on the shards or on other dedicated +resources. + +.. note:: + + .. versionchanged:: 2.1 + + Some aggregation operations using the :dbcommand:`aggregate` + command (i.e. :method:`db.collection.aggregate()`,) will cause + :program:`mongos` instances to require more CPU resources than in + previous versions. This modified performance profile may dictate + alternate architecture decisions if you use the :term:`aggregation + framework` extensively in a sharded environment. + +.. _sharding-query-routing: + +Mongos Routing +~~~~~~~~~~~~~~ + +:program:`mongos` uses information from :ref:`config servers +` to route operations to the cluster as +efficiently as possible. In general, operations in a sharded +environment are either: + +1. Targeted at a single shard or a limited group of shards based on + the shard key. + +2. Broadcast to all shards in the cluster that hold documents in a + collection. + +When possible you should design your operations to be as targeted as +possible. Operations have the following targeting characteristics: + +- Query operations broadcast to all shards [#namespace-exception]_ + **unless** the :program:`mongos` can determine which shard or shard + stores this data. + + For queries that include the shard key, :program:`mongos` can target + the query at a specific shard or set of shards, if the portion + of the shard key included in the query is a *prefix* of the shard + key. For example, if the shard key is: + + .. code-block:: javascript + + { a: 1, b: 1, c: 1 } + + The :program:`mongos` *can* route queries that include the full + shard key or either of the following shard key prefixes at a + specific shard or set of shards: + + .. code-block:: javascript + + { a: 1 } + { a: 1, b: 1 } + + Depending on the distribution of data in the cluster and the + selectivity of the query, :program:`mongos` may still have to + contact multiple shards [#possible-all]_ to fulfill these queries. + +- All :method:`insert() ` operations target to + one shard. + +- All single :method:`update() ` operations + target to one shard. This includes :term:`upsert` operations. + +- The :program:`mongos` broadcasts multi-update operations to every + shard. + +- The :program:`mongos` broadcasts :method:`remove() + ` operations to every shard unless the + operation specifies the shard key in full. + +While some operations must broadcast to all shards, you can improve +performance by using as many targeted operations as possible by +ensuring that your operations include the shard key. + +.. [#namespace-exception] If a shard does not store chunks from a + given collection, queries for documents in that collection are not + broadcast to that shard. + +.. [#a/c-as-a-case-of-a] In this example, a :program:`mongos` could + route a query that included ``{ a: 1, c: 1 }`` fields at a specific + subset of shards using the ``{ a: 1 }`` prefix. A :program:`mongos` + cannot route any of the following queries to specific shards + in the cluster: + + .. code-block:: javascript + + { b: 1 } + { c: 1 } + { b: 1, c: 1 } + +.. [#possible-all] :program:`mongos` will route some queries, even + some that include the shard key, to all shards, if needed. + +Sharded Query Response Process +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To route a query to a :term:`cluster `, +:program:`mongos` uses the following process: + +#. Determine the list of :term:`shards ` that must receive the query. + + In some cases, when the :term:`shard key` or a prefix of the shard + key is a part of the query, the :program:`mongos` can route the + query to a subset of the shards. Otherwise, the :program:`mongos` + must direct the query to *all* shards that hold documents for that + collection. + + .. example:: + + Given the following shard key: + + .. code-block:: javascript + + { zipcode: 1, u_id: 1, c_date: 1 } + + Depending on the distribution of chunks in the cluster, the + :program:`mongos` may be able to target the query at a subset of + shards, if the query contains the following fields: + + .. code-block:: javascript + + { zipcode: 1 } + { zipcode: 1, u_id: 1 } + { zipcode: 1, u_id: 1, c_date: 1 } + +#. Establish a cursor on all targeted shards. + + When the first batch of results returns from the cursors: + + a. For query with sorted results (i.e. using + :method:`cursor.sort()`) the :program:`mongos` performs a merge + sort of all queries. + + b. For a query with unsorted results, the :program:`mongos` returns + a result cursor that "round robins" results from all cursors on + the shards. + + .. versionchanged:: 2.0.5 + Before 2.0.5, the :program:`mongos` exhausted each cursor, + one by one. diff --git a/source/administration/sharding-architectures.txt b/source/administration/sharding-architectures.txt index a717bf7425e..9772ca117ca 100644 --- a/source/administration/sharding-architectures.txt +++ b/source/administration/sharding-architectures.txt @@ -9,63 +9,49 @@ Sharded Cluster Architectures .. default-domain:: mongodb This document describes the organization and design of :term:`sharded -cluster` deployments. For documentation of common administrative tasks -related to sharded clusters, see :doc:`/administration/sharding`. For -complete documentation of sharded clusters see the :doc:`/sharding` -section of this manual. +cluster` deployments. -.. seealso:: :ref:`sharding-requirements`. - -Deploying a Test Cluster ------------------------- - -.. warning:: Use this architecture for testing and development only. +Test Cluster Architecture +------------------------- You can deploy a very minimal cluster for testing and development. These *non-production* clusters have the following components: -- 1 :ref:`config server `. +- One :ref:`config server `. - At least one :program:`mongod` instance (either :term:`replica sets ` or as a standalone node.) -- 1 :program:`mongos` instance. +- One :program:`mongos` instance. + +.. warning:: Use the test cluster architecture for testing and development only. .. _sharding-production-architecture: -Deploying a Production Cluster ------------------------------- +Production Cluster Architecture +------------------------------- -When deploying a production cluster, you must ensure that the -data is redundant and that your systems are highly available. To that -end, a production-level cluster must have the following -components: +In a production cluster, you must ensure that data is redundant and that +your systems are highly available. To that end, a production-level +cluster must have the following components: -- 3 :ref:`config servers `, each residing on a +- Three :ref:`config servers `, each residing on a discrete system. - .. note:: + A single :term:`sharded cluster` must have exclusive use of its + :ref:`config servers `. If you have multiple + shards, you will need to have a group of config servers for each + cluster. - A single :term:`sharded cluster` must have exclusive use of its - :ref:`config servers `. If you have - multiple shards, you will need to have a group of config servers - for each cluster. +- Two or more :term:`replica sets ` to serve as + :term:`shards `. For information on replica sets, see + :doc:`/replication`. -- 2 or more :term:`replica sets `, for the :term:`shards - `. - - .. see:: For more information on replica sets see - :doc:`/administration/replication-architectures` and - :doc:`/replication`. - -- :program:`mongos` instances. Typically, you will deploy a single - :program:`mongos` instance on each application server. Alternatively, - you may deploy several `mongos` nodes and let your application connect - to these via a load balancer. - -.. seealso:: :ref:`sharding-procedure-add-shard` and - :ref:`sharding-procedure-remove-shard`. +- Two or more :program:`mongos` instances. Typically, you deploy a + single :program:`mongos` instance on each application server. + Alternatively, you may deploy several :program:`mongos` nodes and let + your application connect to these via a load balancer. Sharded and Non-Sharded Data ---------------------------- @@ -77,12 +63,10 @@ deployments some databases and collections will use sharding, while other databases and collections will only reside on a single database instance or replica set (i.e. a :term:`shard`.) -.. note:: - - Regardless of the data architecture of your :term:`sharded cluster`, - ensure that all queries and operations use the :term:`mongos` - router to access the data cluster. Use the :program:`mongos` even - for operations that do not impact the sharded data. +Regardless of the data architecture of your :term:`sharded cluster`, +ensure that all queries and operations use the :term:`mongos` router to +access the data cluster. Use the :program:`mongos` even for operations +that do not impact the sharded data. Every database has a "primary" [#overloaded-primary-term]_ shard that holds all un-sharded collections in that database. All collections @@ -119,7 +103,7 @@ High Availability and MongoDB A :ref:`production ` :term:`cluster` has no single point of failure. This section introduces the -availability concerns for MongoDB deployments, and highlights +availability concerns for MongoDB deployments and highlights potential failure scenarios and available resolutions: - Application servers or :program:`mongos` instances become unavailable. diff --git a/source/administration/sharding-config-server.txt b/source/administration/sharding-config-server.txt new file mode 100644 index 00000000000..ea1509f2301 --- /dev/null +++ b/source/administration/sharding-config-server.txt @@ -0,0 +1,224 @@ +.. index:: config servers; operations +.. _sharding-procedure-config-server: + +========================= +Manage the Config Servers +========================= + +.. default-domain:: mongodb + +:ref:`Config servers ` store all cluster metadata, most importantly, +the mapping from :term:`chunks ` to :term:`shards `. +This section provides an overview of the basic +procedures to migrate, replace, and maintain these servers. + +This page includes the following: + +- :ref:`sharding-config-server-deploy-three` + +- :ref:`sharding-process-config-server-migrate-same-hostname` + +- :ref:`sharding-process-config-server-migrate-different-hostname` + +- :ref:`sharding-config-server-replace` + +- :ref:`sharding-config-server-backup` + +.. _sharding-config-server-deploy-three: + +Deploy Three Config Servers for Production Deployments +------------------------------------------------------ + +For redundancy, all production :term:`sharded clusters ` +should deploy three config servers processes on three different +machines. + +Do not use only a single config server for production deployments. +Only use a single config server deployments for testing. You should +upgrade to three config servers immediately if you are shifting to +production. The following process shows how to convert a test +deployment with only one config server to production deployment with +three config servers. + +#. Shut down all existing MongoDB processes. This includes: + + - all :program:`mongod` instances or :term:`replica sets ` + that provide your shards. + + - the :program:`mongod` instance that provides your existing config + database. + + - all :program:`mongos` instances in your cluster. + +#. Copy the entire :setting:`dbpath` file system tree from the + existing config server to the two machines that will provide the + additional config servers. These commands, issued on the system + with the existing :ref:`config-database`, ``mongo-config0.example.net`` may + resemble the following: + + .. code-block:: sh + + rsync -az /data/configdb mongo-config1.example.net:/data/configdb + rsync -az /data/configdb mongo-config2.example.net:/data/configdb + +#. Start all three config servers, using the same invocation that you + used for the single config server. + + .. code-block:: sh + + mongod --configsvr + +#. Restart all shard :program:`mongod` and :program:`mongos` processes. + +.. _sharding-process-config-server-migrate-same-hostname: + +Migrate Config Servers with the Same Hostname +--------------------------------------------- + +Use this process when you need to migrate a config server to a new +system but the new system will be accessible using the same host +name. + +#. Shut down the config server that you're moving. + + This will render all config data for your cluster :ref:`read only + `. + +#. Change the DNS entry that points to the system that provided the old + config server, so that the *same* hostname points to the new + system. + + How you do this depends on how you organize your DNS and + hostname resolution services. + +#. Move the entire :setting:`dbpath` file system tree from the old + config server to the new config server. This command, issued on the + old config server system, may resemble the following: + + .. code-block:: sh + + rsync -az /data/configdb mongo-config0.example.net:/data/configdb + +#. Start the config instance on the new system. The default invocation + is: + + .. code-block:: sh + + mongod --configsvr + +When you start the third config server, your cluster will become +writable and it will be able to create new splits and migrate chunks +as needed. + +.. _sharding-process-config-server-migrate-different-hostname: + +Migrate Config Servers with Different Hostnames +----------------------------------------------- + +Use this process when you need to migrate a :ref:`config-database` to a new +server and it *will not* be accessible via the same hostname. If +possible, avoid changing the hostname so that you can use the +:ref:`previous procedure `. + +#. Shut down the :ref:`config server ` you're moving. + + This will render all config data for your cluster "read only:" + + .. code-block:: sh + + rsync -az /data/configdb mongodb.config2.example.net:/data/configdb + +#. Start the config instance on the new system. The default invocation + is: + + .. code-block:: sh + + mongod --configsvr + +#. Shut down all existing MongoDB processes. This includes: + + - all :program:`mongod` instances or :term:`replica sets ` + that provide your shards. + + - the :program:`mongod` instances that provide your existing + :ref:`config databases `. + + - all :program:`mongos` instances in your cluster. + +#. Restart all :program:`mongod` processes that provide the shard + servers. + +#. Update the :option:`--configdb ` parameter (or + :setting:`configdb`) for all :program:`mongos` instances and + restart all :program:`mongos` instances. + +.. _sharding-config-server-replace: + +Replace a Config Server +----------------------- + +Use this procedure only if you need to replace one of your config +servers after it becomes inoperable (e.g. hardware failure.) This +process assumes that the hostname of the instance will not change. If +you must change the hostname of the instance, use the process for +:ref:`migrating a config server to a different hostname +`. + +#. Provision a new system, with the same hostname as the previous + host. + + You will have to ensure that the new system has the same IP address + and hostname as the system it's replacing *or* you will need to + modify the DNS records and wait for them to propagate. + +#. Shut down *one* (and only one) of the existing config servers. Copy + all this host's :setting:`dbpath` file system tree from the current system + to the system that will provide the new config server. This + command, issued on the system with the data files, may resemble the + following: + + .. code-block:: sh + + rsync -az /data/configdb mongodb.config2.example.net:/data/configdb + +#. Restart the config server process that you used in the previous + step to copy the data files to the new config server instance. + +#. Start the new config server instance. The default invocation is: + + .. code-block:: sh + + mongod --configsvr + +.. note:: + + In the course of this procedure *never* remove a config server from + the :setting:`configdb` parameter on any of the :program:`mongos` + instances. If you need to change the name of a config server, + always make sure that all :program:`mongos` instances have three + config servers specified in the :setting:`configdb` setting at all + times. + +.. _sharding-config-server-backup: + +Backup Cluster Metadata +----------------------- + +The cluster will remain operational [#read-only]_ without one +of the config database's :program:`mongod` instances, creating a backup +of the cluster metadata from the config database is straight forward: + +#. Shut down one of the :term:`config databases `. + +#. Create a full copy of the data files (i.e. the path specified by + the :setting:`dbpath` option for the config instance.) + +#. Restart the original configuration server. + +.. seealso:: :doc:`backups`. + +.. [#read-only] While one of the three config servers is unavailable, + the cluster cannot split any chunks nor can it migrate chunks + between shards. Your application will be able to write data to the + cluster. The :ref:`sharding-config-server` section of the + documentation provides more information on this topic. diff --git a/source/administration/sharding-troubleshooting.txt b/source/administration/sharding-troubleshooting.txt new file mode 100644 index 00000000000..99505e99a2e --- /dev/null +++ b/source/administration/sharding-troubleshooting.txt @@ -0,0 +1,152 @@ +.. index:: troubleshooting; sharding +.. index:: sharding; troubleshooting +.. _sharding-troubleshooting: + +================================ +Troubleshooting Sharded Clusters +================================ + +.. default-domain:: mongodb + +The two most important factors in maintaining a successful sharded cluster are: + +- :ref:`choosing an appropriate shard key ` and + +- :ref:`sufficient capacity to support current and future operations + `. + +You can prevent most issues encountered with sharding by ensuring that +you choose the best possible :term:`shard key` for your deployment and +ensure that you are always adding additional capacity to your cluster +well before the current resources become saturated. Continue reading +for specific issues you may encounter in a production environment. + +.. _sharding-troubleshooting-not-splitting: + +All Data Remains on One Shard +----------------------------- + +Your cluster must have sufficient data for sharding to make +sense. Sharding works by migrating chunks between the shards until +each shard has roughly the same number of chunks. + +The default chunk size is 64 megabytes. MongoDB will not begin +migrations until the imbalance of chunks in the cluster exceeds the +:ref:`migration threshold `. While the +default chunk size is configurable with the :setting:`chunkSize` +setting, these behaviors help prevent unnecessary chunk migrations, +which can degrade the performance of your cluster as a whole. + +If you have just deployed a sharded cluster, make sure that you have +enough data to make sharding effective. If you do not have sufficient +data to create more than eight 64 megabyte chunks, then all data will +remain on one shard. Either lower the :ref:`chunk size +` setting, or add more data to the cluster. + +As a related problem, the system will split chunks only on +inserts or updates, which means that if you configure sharding and do not +continue to issue insert and update operations, the database will not +create any chunks. You can either wait until your application inserts +data *or* :ref:`split chunks manually `. + +Finally, if your shard key has a low :ref:`cardinality +`, MongoDB may not be able to create +sufficient splits among the data. + +One Shard Receives Too Much Traffic +----------------------------------- + +In some situations, a single shard or a subset of the cluster will +receive a disproportionate portion of the traffic and workload. In +almost all cases this is the result of a shard key that does not +effectively allow :ref:`write scaling `. + +It's also possible that you have "hot chunks." In this case, you may +be able to solve the problem by splitting and then migrating parts of +these chunks. + +In the worst case, you may have to consider re-sharding your data +and :ref:`choosing a different shard key ` +to correct this pattern. + +The Cluster Does Not Balance +---------------------------- + +If you have just deployed your sharded cluster, you may want to +consider the :ref:`troubleshooting suggestions for a new cluster where +data remains on a single shard `. + +If the cluster was initially balanced, but later developed an uneven +distribution of data, consider the following possible causes: + +- You have deleted or removed a significant amount of data from the + cluster. If you have added additional data, it may have a + different distribution with regards to its shard key. + +- Your :term:`shard key` has low :ref:`cardinality ` + and MongoDB cannot split the chunks any further. + +- Your data set is growing faster than the balancer can distribute + data around the cluster. This is uncommon and + typically is the result of: + + - a :ref:`balancing window ` that + is too short, given the rate of data growth. + + - an uneven distribution of :ref:`write operations + ` that requires more data + migration. You may have to choose a different shard key to resolve + this issue. + + - poor network connectivity between shards, which may lead to chunk + migrations that take too long to complete. Investigate your + network configuration and interconnections between shards. + +Migrations Render Cluster Unusable +---------------------------------- + +If migrations impact your cluster or application's performance, +consider the following options, depending on the nature of the impact: + +#. If migrations only interrupt your clusters sporadically, you can + limit the :ref:`balancing window + ` to prevent balancing activity + during peak hours. Ensure that there is enough time remaining to + keep the data from becoming out of balance again. + +#. If the balancer is always migrating chunks to the detriment of + overall cluster performance: + + - You may want to attempt :ref:`decreasing the chunk size ` + to limit the size of the migration. + + - Your cluster may be over capacity, and you may want to attempt to + :ref:`add one or two shards ` to + the cluster to distribute load. + +It's also possible that your shard key causes your +application to direct all writes to a single shard. This kind of +activity pattern can require the balancer to migrate most data soon after writing +it. Consider redeploying your cluster with a shard key that provides +better :ref:`write scaling `. + +Disable Balancing During Backups +-------------------------------- + +If MongoDB migrates a :term:`chunk` during a :doc:`backup +`, you can end with an inconsistent snapshot +of your :term:`sharded cluster`. Never run a backup while the balancer is +active. To ensure that the balancer is inactive during your backup +operation: + +- Set the :ref:`balancing window ` + so that the balancer is inactive during the backup. Ensure that the + backup can complete while you have the balancer disabled. + +- :ref:`manually disable the balancer ` + for the duration of the backup procedure. + +Confirm that the balancer is not active using the +:method:`sh.getBalancerState()` method before starting a backup +operation. When the backup procedure is complete you can reactivate +the balancer process. diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index d218684fac8..39190dcdf1a 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -1,1398 +1,5 @@ -.. index:: administration; sharding -.. _sharding-administration: +======================= +Sharding Administration +======================= -============================== -Sharded Cluster Administration -============================== - -.. default-domain:: mongodb - -This document describes common administrative tasks for sharded -clusters. For complete documentation of sharded clusters see the -:doc:`/sharding` section of this manual. - -.. contents:: Sharding Procedures: - :backlinks: none - :local: - -.. _sharding-procedure-setup: - -Set up a Sharded Cluster ------------------------- - -Before deploying a cluster, see :ref:`sharding-requirements`. - -For testing purposes, you can run all the required shard :program:`mongod` processes on a -single server. For production, use the configurations described in -:doc:`/administration/replication-architectures`. - -.. include:: /includes/warning-sharding-hostnames.rst - -If you have an existing replica set, you can use the -:doc:`/tutorial/convert-replica-set-to-replicated-shard-cluster` -tutorial as a guide. If you're deploying a cluster from scratch, see -the :doc:`/tutorial/deploy-shard-cluster` tutorial for more detail or -use the following procedure as a quick starting point: - -1. Create data directories for each of the three (3) config server instances. - -#. Start the three config server instances. For example, to start a - config server instance running on TCP port ``27019`` with the data - stored in ``/data/configdb``, type the following: - - .. code-block:: sh - - mongod --configsvr --dbpath /data/configdb --port 27019 - - For additional command options, see :doc:`/reference/mongod` - and :doc:`/reference/configuration-options`. - - .. include:: /includes/note-config-server-startup.rst - -#. Start a :program:`mongos` instance. For example, to start a - :program:`mongos` that connects to config server instance running on the following hosts: - - - ``mongoc0.example.net`` - - ``mongoc1.example.net`` - - ``mongoc2.example.net`` - - You would issue the following command: - - .. code-block:: sh - - mongos --configdb mongoc0.example.net:27019,mongoc1.example.net:27019,mongoc2.example.net:27019 - -#. Connect to one of the :program:`mongos` instances. For example, if - a :program:`mongos` is accessible at ``mongos0.example.net`` on - port ``27017``, issue the following command: - - .. code-block:: sh - - mongo mongos0.example.net - -#. Add shards to the cluster. - - .. note:: In production deployments, all shards should be replica sets. - - To deploy a replica set, see the - :doc:`/tutorial/deploy-replica-set` tutorial. - - From the :program:`mongo` shell connected - to the :program:`mongos` instance, call the :method:`sh.addShard()` - method for each shard to add to the cluster. - - For example: - - .. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - - If ``mongodb0.example.net:27027`` is a member of a replica - set, call the :method:`sh.addShard()` method with an argument that - resembles the following: - - .. code-block:: javascript - - sh.addShard( "/mongodb0.example.net:27027" ) - - Replace, ```` with the name of the replica set, and - MongoDB will discover all other members of the replica set. - Repeat this step for each new shard in your cluster. - - .. optional:: - - You can specify a name for the shard and a maximum size. See - :dbcommand:`addShard`. - - .. note:: - - .. versionchanged:: 2.0.3 - - Before version 2.0.3, you must specify the shard in the following - form: - - .. code-block:: sh - - replicaSetName/,, - - For example, if the name of the replica set is ``repl0``, then - your :method:`sh.addShard()` command would be: - - .. code-block:: javascript - - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) - -#. Enable sharding for each database you want to shard. - While sharding operates on a per-collection basis, you must enable - sharding for each database that holds collections you want to shard. - This step is a meta-data change and will not redistribute your data. - - MongoDB enables sharding on a per-database basis. This is only a - meta-data change and will not redistribute your data. To enable - sharding for a given database, use the :dbcommand:`enableSharding` - command or the :method:`sh.enableSharding()` shell helper. - - .. code-block:: javascript - - db.runCommand( { enableSharding: } ) - - Or: - - .. code-block:: javascript - - sh.enableSharding() - - .. note:: - - MongoDB creates databases automatically upon their first use. - - Once you enable sharding for a database, MongoDB assigns a - :term:`primary shard` for that database, where MongoDB stores all data - before sharding begins. - -.. _sharding-administration-shard-collection: - -#. Enable sharding on a per-collection basis. - - Finally, you must explicitly specify collections to shard. The - collections must belong to a database for which you have enabled - sharding. When you shard a collection, you also choose the shard - key. To shard a collection, run the :dbcommand:`shardCollection` - command or the :method:`sh.shardCollection()` shell helper. - - .. code-block:: javascript - - db.runCommand( { shardCollection: ".", key: { : 1 } } ) - - Or: - - .. code-block:: javascript - - sh.shardCollection(".", ) - - For example: - - .. code-block:: javascript - - db.runCommand( { shardCollection: "myapp.users", key: { username: 1 } } ) - - Or: - - .. code-block:: javascript - - sh.shardCollection("myapp.users", { username: 1 }) - - The choice of shard key is incredibly important: it affects - everything about the cluster from the efficiency of your queries to - the distribution of data. Furthermore, you cannot change a - collection's shard key after setting it. - - See the :ref:`Shard Key Overview ` and the - more in depth documentation of :ref:`Shard Key Qualities - ` to help you select better shard - keys. - - If you do not specify a shard key, MongoDB will shard the - collection using the ``_id`` field. - -Cluster Management ------------------- - -This section outlines procedures for adding and remove shards, as well -as general monitoring and maintenance of a :term:`sharded cluster`. - -.. _sharding-procedure-add-shard: - -Add a Shard to a Cluster -~~~~~~~~~~~~~~~~~~~~~~~~ - -To add a shard to an *existing* sharded cluster, use the following -procedure: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. First, you need to tell the cluster where to find the individual - shards. You can do this using the :dbcommand:`addShard` command or - the :method:`sh.addShard()` helper: - - .. code-block:: javascript - - sh.addShard( ":" ) - - Replace ```` and ```` with the hostname and TCP - port number of where the shard is accessible. - Alternately specify a :term:`replica set` name and at least one - hostname which is a member of the replica set. - - For example: - - .. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - - .. note:: In production deployments, all shards should be replica sets. - - Repeat for each shard in your cluster. - - .. optional:: - - You may specify a "name" as an argument to the - :dbcommand:`addShard` command, as follows: - - .. code-block:: javascript - - db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) - - You cannot specify a name for a shard using the - :method:`sh.addShard()` helper in the :program:`mongo` shell. If - you use the helper or do not specify a shard name, then MongoDB - will assign a name upon creation. - - .. versionchanged:: 2.0.3 - Before version 2.0.3, you must specify the shard in the - following form: the replica set name, followed by a forward - slash, followed by a comma-separated list of seeds for the - replica set. For example, if the name of the replica set is - "myapp1", then your :method:`sh.addShard()` command might resemble: - - .. code-block:: javascript - - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) - -.. note:: - - It may take some time for :term:`chunks ` to migrate to the - new shard. - - For an introduction to balancing, see :ref:`sharding-balancing`. For - lower level information on balancing, see :ref:`sharding-balancing-internals`. - -.. _sharding-procedure-remove-shard: - -Remove a Shard from a Cluster -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To remove a :term:`shard` from a :term:`sharded cluster`, you must: - -- Migrate :term:`chunks ` to another shard or database. - -- Ensure that this shard is not the :term:`primary shard` for any databases in - the cluster. If it is, move the "primary" status for these databases - to other shards. - -- Finally, remove the shard from the cluster's configuration. - -.. note:: - - To successfully migrate data from a shard, the :term:`balancer` - process **must** be active. - -The procedure to remove a shard is as follows: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Determine the name of the shard you will be removing. - - You must specify the name of the shard. You may have specified this - shard name when you first ran the :dbcommand:`addShard` command. If not, - you can find out the name of the shard by running the - :dbcommand:`listShards` or :dbcommand:`printShardingStatus` - commands or the :method:`sh.status()` shell helper. - - The following examples will remove a shard named ``mongodb0`` from the cluster. - -#. Begin removing chunks from the shard. - - Start by running the :dbcommand:`removeShard` command. This will - start "draining" or migrating chunks from the shard you're removing - to another shard in the cluster. - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - This operation will return the following response immediately: - - .. code-block:: javascript - - { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } - - Depending on your network capacity and the amount of data in the - shard, this operation can take anywhere from a few minutes to several - days to complete. - -#. View progress of the migration. - - You can run the :dbcommand:`removeShard` command again at any stage of the - process to view the progress of the migration, as follows: - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - The output should look something like this: - - .. code-block:: javascript - - { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } - - In the ``remaining`` sub-document ``{ chunks: xx, dbs: y }``, a - counter displays the remaining number of chunks that MongoDB must - migrate to other shards and the number of MongoDB databases that have - "primary" status on this shard. - - Continue checking the status of the :dbcommand:`removeShard` command - until the remaining number of chunks to transfer is 0. - -#. Move any databases to other shards in the cluster as needed. - - This is only necessary when removing a shard that is also the - :term:`primary shard` for one or more databases. - - Issue the following command at the :program:`mongo` shell: - - .. code-block:: javascript - - db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) - - This command will migrate all remaining non-sharded data in the - database named ``myapp`` to the shard named ``mongodb1``. - - .. warning:: - - Do not run the :dbcommand:`movePrimary` command until you have *finished* - draining the shard. - - The command will not return until MongoDB completes moving all - data. The response from this command will resemble the following: - - .. code-block:: javascript - - { "primary" : "mongodb1", "ok" : 1 } - -#. Run :dbcommand:`removeShard` again to clean up all metadata - information and finalize the shard removal, as follows: - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - When successful, this command will return a document like this: - - .. code-block:: javascript - - { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } - -Once the value of the ``stage`` field is "completed," you may safely -stop the processes comprising the ``mongodb0`` shard. - -List Databases with Sharding Enabled -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To list the databases that have sharding enabled, query the -``databases`` collection in the :ref:`config-database`. -A database has sharding enabled if the value of the ``partitioned`` -field is ``true``. Connect to a :program:`mongos` instance with a -:program:`mongo` shell, and run the following operation to get a full -list of databases with sharding enabled: - -.. code-block:: javascript - - use config - db.databases.find( { "partitioned": true } ) - -.. example:: You can use the following sequence of commands when to - return a list of all databases in the cluster: - - .. code-block:: javascript - - use config - db.databases.find() - - If this returns the following result set: - - .. code-block:: javascript - - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" } - { "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" } - - Then sharding is only enabled for the ``animals`` database. - -List Shards -~~~~~~~~~~~ - -To list the current set of configured shards, use the :dbcommand:`listShards` -command, as follows: - -.. code-block:: javascript - - use admin - db.runCommand( { listShards : 1 } ) - -View Cluster Details -~~~~~~~~~~~~~~~~~~~~ - -To view cluster details, issue :method:`db.printShardingStatus()` or -:method:`sh.status()`. Both methods return the same output. - -.. example:: In the following example output from :method:`sh.status()` - - - ``sharding version`` displays the version number of the shard - metadata. - - - ``shards`` displays a list of the :program:`mongod` instances - used as shards in the cluster. - - - ``databases`` displays all databases in the cluster, - including database that do not have sharding enabled. - - - The ``chunks`` information for the ``foo`` database displays how - many chunks are on each shard and displays the range of each chunk. - - .. code-block:: javascript - - --- Sharding Status --- - sharding version: { "_id" : 1, "version" : 3 } - shards: - { "_id" : "shard0000", "host" : "m0.example.net:30001" } - { "_id" : "shard0001", "host" : "m3.example2.net:50000" } - databases: - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "animals", "partitioned" : true, "primary" : "shard0000" } - foo.big chunks: - shard0001 1 - shard0000 6 - { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo - { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo - { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo - { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) - { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - foo.large chunks: - shard0001 1 - shard0000 5 - { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) - { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo - { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) - { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } - -Chunk Management ----------------- - -This section describes various operations on :term:`chunks ` in -:term:`sharded clusters `. MongoDB automates most -chunk management operations. However, these chunk management -operations are accessible to administrators for use in some -situations, typically surrounding initial setup, deployment, and data -ingestion. - -.. _sharding-procedure-create-split: - -Split Chunks -~~~~~~~~~~~~ - -Normally, MongoDB splits a :term:`chunk` following inserts when a -chunk exceeds the :ref:`chunk size `. The -:term:`balancer` may migrate recently split chunks to a new shard -immediately if :program:`mongos` predicts future insertions will -benefit from the move. - -MongoDB treats all chunks the same, whether split manually or -automatically by the system. - -.. warning:: - - You cannot merge or combine chunks once you have split them. - -You may want to split chunks manually if: - -- you have a large amount of data in your cluster and very few - :term:`chunks `, - as is the case after deploying a cluster using existing data. - -- you expect to add a large amount of data that would - initially reside in a single chunk or shard. - -.. example:: - - You plan to insert a large amount of data with :term:`shard key` - values between ``300`` and ``400``, *but* all values of your shard - keys are between ``250`` and ``500`` are in a single chunk. - -Use :method:`sh.status()` to determine the current chunks ranges across -the cluster. - -To split chunks manually, use the :dbcommand:`split` command with -operators: ``middle`` and ``find``. The equivalent shell helpers are -:method:`sh.splitAt()` or :method:`sh.splitFind()`. - -.. example:: - - The following command will split the chunk that contains - the value of ``63109`` for the ``zipcode`` field in the ``people`` - collection of the ``records`` database: - - .. code-block:: javascript - - sh.splitFind( "records.people", { "zipcode": 63109 } ) - -:method:`sh.splitFind()` will split the chunk that contains the -*first* document returned that matches this query into two equally -sized chunks. You must specify the full namespace -(i.e. "``.``") of the sharded collection to -:method:`sh.splitFind()`. The query in :method:`sh.splitFind()` need -not contain the shard key, though it almost always makes sense to -query for the shard key in this case, and including the shard key will -expedite the operation. - -Use :method:`sh.splitAt()` to split a chunk in two using the queried -document as the partition point: - -.. code-block:: javascript - - sh.splitAt( "records.people", { "zipcode": 63109 } ) - -However, the location of the document that this query finds with -respect to the other documents in the chunk does not affect how the -chunk splits. - -.. _sharding-administration-pre-splitting: -.. _sharding-administration-create-chunks: - -Create Chunks (Pre-Splitting) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In most situations a :term:`sharded cluster` will create and distribute -chunks automatically without user intervention. However, in a limited -number of use profiles, MongoDB cannot create enough chunks or -distribute data fast enough to support required throughput. Consider -the following scenarios: - -- you must partition an existing data collection that resides on a - single shard. - -- you must ingest a large volume of data into a cluster that - isn't balanced, or where the ingestion of data will lead to an - imbalance of data. - - This can arise in an initial data loading, or in a case where you - must insert a large volume of data into a single chunk, as is the - case when you must insert at the beginning or end of the chunk - range, as is the case for monotonically increasing or decreasing - shard keys. - -Preemptively splitting chunks increases cluster throughput for these -operations, by reducing the overhead of migrating chunks that hold -data during the write operation. MongoDB only creates splits after an -insert operation, and can only migrate a single chunk at a time. Chunk -migrations are resource intensive and further complicated by large -write volume to the migrating chunk. - -.. warning:: - - You can only pre-split an empty collection. When you enable - sharding for a collection that contains data MongoDB automatically - creates splits. Subsequent attempts to create splits manually, can - lead to unpredictable chunk ranges and sizes as well as inefficient - or ineffective balancing behavior. - -To create and migrate chunks manually, use the following procedure: - -#. Split empty chunks in your collection by manually performing - :dbcommand:`split` command on chunks. - - .. example:: - - To create chunks for documents in the ``myapp.users`` - collection, using the ``email`` field as the :term:`shard key`, - use the following operation in the :program:`mongo` shell: - - .. code-block:: javascript - - for ( var x=97; x<97+26; x++ ){ - for( var y=97; y<97+26; y+=6 ) { - var prefix = String.fromCharCode(x) + String.fromCharCode(y); - db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); - } - } - - This assumes a collection size of 100 million documents. - -#. Migrate chunks manually using the :dbcommand:`moveChunk` command: - - .. example:: - - To migrate all of the manually created user profiles evenly, - putting each prefix chunk on the next shard from the other, run - the following commands in the mongo shell: - - .. code-block:: javascript - - var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ]; - for ( var x=97; x<97+26; x++ ){ - for( var y=97; y<97+26; y+=6 ) { - var prefix = String.fromCharCode(x) + String.fromCharCode(y); - db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]}) - } - } - - You can also let the balancer automatically distribute the new - chunks. For an introduction to balancing, see - :ref:`sharding-balancing`. For lower level information on balancing, - see :ref:`sharding-balancing-internals`. - -.. _sharding-balancing-modify-chunk-size: - -Modify Chunk Size -~~~~~~~~~~~~~~~~~ - -When you initialize a sharded cluster, the default chunk size is 64 -megabytes. This default chunk size works well for most deployments. However, if you -notice that automatic migrations are incurring a level of I/O that -your hardware cannot handle, you may want to reduce the chunk -size. For the automatic splits and migrations, a small chunk size -leads to more rapid and frequent migrations. - -To modify the chunk size, use the following procedure: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Issue the following :method:`save() ` - operation: - - .. code-block:: javascript - - db.settings.save( { _id:"chunksize", value: } ) - - Where the value of ```` reflects the new chunk size in - megabytes. Here, you're essentially writing a document whose values - store the global chunk size configuration value. - -.. note:: - - The :setting:`chunkSize` and :option:`--chunkSize ` - options, passed at runtime to the :program:`mongos` **do not** - affect the chunk size after you have initialized the cluster. - - To eliminate confusion you should *always* set chunk size using the - above procedure and never use the runtime options. - -Modifying the chunk size has several limitations: - -- Automatic splitting only occurs when inserting :term:`documents - ` or updating existing documents. - -- If you lower the chunk size it may take time for all chunks to split to - the new size. - -- Splits cannot be "undone." - -If you increase the chunk size, existing chunks must grow through -insertion or updates until they reach the new size. - -.. _sharding-balancing-manual-migration: - -Migrate Chunks -~~~~~~~~~~~~~~ - -In most circumstances, you should let the automatic balancer -migrate :term:`chunks ` between :term:`shards `. -However, you may want to migrate chunks manually in a few cases: - -- If you create chunks by :term:`pre-splitting` the data in your - collection, you will have to migrate chunks manually to distribute - chunks evenly across the shards. Use pre-splitting in limited - situations, to support bulk data ingestion. - -- If the balancer in an active cluster cannot distribute chunks within - the balancing window, then you will have to migrate chunks manually. - -For more information on how chunks move between shards, see -:ref:`sharding-balancing-internals`, in particular the section -:ref:`sharding-chunk-migration`. - -To migrate chunks, use the :dbcommand:`moveChunk` command. - -.. note:: - - To return a list of shards, use the :dbcommand:`listShards` - command. - - Specify shard names using the :dbcommand:`addShard` command - using the ``name`` argument. If you do not specify a name in the - :dbcommand:`addShard` command, MongoDB will assign a name - automatically. - -The following example assumes that the field ``username`` is the -:term:`shard key` for a collection named ``users`` in the ``myapp`` -database, and that the value ``smith`` exists within the :term:`chunk` -you want to migrate. - -To move this chunk, you would issue the following command from a :program:`mongo` -shell connected to any :program:`mongos` instance. - -.. code-block:: javascript - - db.adminCommand({moveChunk : "myapp.users", find : {username : "smith"}, to : "mongodb-shard3.example.net"}) - -This command moves the chunk that includes the shard key value "smith" to the -:term:`shard` named ``mongodb-shard3.example.net``. The command will -block until the migration is complete. - -See :ref:`sharding-administration-create-chunks` for an introduction -to pre-splitting. - -.. versionadded:: 2.2 - :dbcommand:`moveChunk` command has the: ``_secondaryThrottle`` - parameter. When set to ``true``, MongoDB ensures that - :term:`secondary` members have replicated operations before allowing - new chunk migrations. - -.. warning:: - - The :dbcommand:`moveChunk` command may produce the following error - message: - - .. code-block:: none - - The collection's metadata lock is already taken. - - These errors occur when clients have too many open :term:`cursors - ` that access the chunk you are migrating. You can either - wait until the cursors complete their operation or close the - cursors manually. - - .. todo:: insert link to killing a cursor. - -.. index:: bulk insert -.. _sharding-bulk-inserts: - -Strategies for Bulk Inserts in Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. todo:: Consider moving to the administrative guide as it's of an - applied nature, or create an applications document for sharding - -.. todo:: link the words "bulk insert" to the bulk insert topic when - it's published - - -Large bulk insert operations including initial data ingestion or -routine data import, can have a significant impact on a :term:`sharded -cluster`. Consider the following strategies and possibilities for -bulk insert operations: - -- If the collection does not have data, then there is only one - :term:`chunk`, which must reside on a single shard. MongoDB must - receive data, create splits, and distribute chunks to the available - shards. To avoid this performance cost, you can pre-split the - collection, as described in :ref:`sharding-administration-pre-splitting`. - -- You can parallelize import processes by sending insert operations to - more than one :program:`mongos` instance. If the collection is - empty, pre-split first, as described in - :ref:`sharding-administration-pre-splitting`. - -- If your shard key increases monotonically during an insert then all - the inserts will go to the last chunk in the collection, which will - always end up on a single shard. Therefore, the insert capacity of the - cluster will never exceed the insert capacity of a single shard. - - If your insert volume is never larger than what a single shard can - process, then there is no problem; however, if the insert volume - exceeds that range, and you cannot avoid a monotonically - increasing shard key, then consider the following modifications to - your application: - - - Reverse all the bits of the shard key to preserve the information - while avoiding the correlation of insertion order and increasing - sequence of values. - - - Swap the first and last 16-bit words to "shuffle" the inserts. - - .. example:: The following example, in C++, swaps the leading and - trailing 16-bit word of :term:`BSON` :term:`ObjectIds ` - generated so that they are no longer monotonically increasing. - - .. code-block:: cpp - - using namespace mongo; - OID make_an_id() { - OID x = OID::gen(); - const unsigned char *p = x.getData(); - swap( (unsigned short&) p[0], (unsigned short&) p[10] ); - return x; - } - - void foo() { - // create an object - BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); - // now we might insert o into a sharded collection... - } - - For information on choosing a shard key, see :ref:`sharding-shard-key` - and see :ref:`Shard Key Internals ` (in - particular, :ref:`sharding-internals-operations-and-reliability` and - :ref:`sharding-internals-choose-shard-key`). - -.. index:: balancing; operations -.. _sharding-balancing-operations: - -Balancer Operations -------------------- - -This section describes provides common administrative procedures related -to balancing. For an introduction to balancing, see -:ref:`sharding-balancing`. For lower level information on balancing, see -:ref:`sharding-balancing-internals`. - -.. _sharding-balancing-check-lock: - -Check the Balancer Lock -~~~~~~~~~~~~~~~~~~~~~~~ - -To see if the balancer process is active in your :term:`cluster -`, do the following: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Use the following query to return the balancer lock: - - .. code-block:: javascript - - db.locks.find( { _id : "balancer" } ).pretty() - -When this command returns, you will see output like the following: - -.. code-block:: javascript - - { "_id" : "balancer", - "process" : "mongos0.example.net:1292810611:1804289383", - "state" : 2, - "ts" : ObjectId("4d0f872630c42d1978be8a2e"), - "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)", - "who" : "mongos0.example.net:1292810611:1804289383:Balancer:846930886", - "why" : "doing balance round" } - - -This output confirms that: - -- The balancer originates from the :program:`mongos` running on the - system with the hostname ``mongos0.example.net``. - -- The value in the ``state`` field indicates that a :program:`mongos` - has the lock. For version 2.0 and later, the value of an active lock - is ``2``; for earlier versions the value is ``1``. - -.. optional:: - - You can also use the following shell helper, which returns a - boolean to report if the balancer is active: - - .. code-block:: javascript - - sh.getBalancerState() - -.. _sharding-schedule-balancing-window: - -Schedule the Balancing Window -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In some situations, particularly when your data set grows slowly and a -migration can impact performance, it's useful to be able to ensure -that the balancer is active only at certain times. Use the following -procedure to specify a window during which the :term:`balancer` will -be able to migrate chunks: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Use an operation modeled on the following example :method:`update() - ` operation to modify the balancer's - window: - - .. code-block:: javascript - - db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "", stop : "" } } }, true ) - - Replace ```` and ```` with time values using - two digit hour and minute values (e.g ``HH:MM``) that describe the - beginning and end boundaries of the balancing window. - These times will be evaluated relative to the time zone of each individual - :program:`mongos` instance in the sharded cluster. - For instance, running the following - will force the balancer to run between 11PM and 6AM local time only: - - .. code-block:: javascript - - db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:00" } } }, true ) - -.. note:: - - The balancer window must be sufficient to *complete* the migration - of all data inserted during the day. - - As data insert rates can change based on activity and usage - patterns, it is important to ensure that the balancing window you - select will be sufficient to support the needs of your deployment. - -Remove a Balancing Window Schedule -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If you have :ref:`set the balancing window -` and wish to remove the schedule -so that the balancer is always running, issue the following sequence -of operations: - -.. code-block:: javascript - - use config - db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true }) - -.. _sharding-balancing-disable-temporally: - -Disable the Balancer -~~~~~~~~~~~~~~~~~~~~ - -By default the balancer may run at any time and only moves chunks as -needed. To disable the balancer for a short period of time and prevent -all migration, use the following procedure: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue *one* of the following operations to disable the balancer: - - .. code-block:: javascript - - sh.stopBalancer() - -#. Later, issue *one* the following operations to enable the balancer: - - .. code-block:: javascript - - sh.startBalancer() - -.. note:: - - If a migration is in progress, the system will complete - the in-progress migration. After disabling, you can use the - following operation in the :program:`mongo` shell to determine if - there are no migrations in progress: - - .. code-block:: javascript - - use config - while( db.locks.findOne({_id: "balancer"}).state ) { - print("waiting..."); sleep(1000); - } - - -The above process and the :method:`sh.setBalancerState()`, -:method:`sh.startBalancer()`, and :method:`sh.stopBalancer()` helpers provide -wrappers on the following process, which may be useful if you need to -run this operation from a driver that does not have helper functions: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Issue the following update to disable the balancer: - - .. code-block:: javascript - - db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ); - -#. To enable the balancer again, alter the value of "stopped" as follows: - - .. code-block:: javascript - - db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true ); - -.. index:: config servers; operations -.. _sharding-procedure-config-server: - -Config Server Maintenance -------------------------- - -Config servers store all cluster metadata, most importantly, -the mapping from :term:`chunks ` to :term:`shards `. -This section provides an overview of the basic -procedures to migrate, replace, and maintain these servers. - -.. seealso:: :ref:`sharding-config-server` - -Deploy Three Config Servers for Production Deployments -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -For redundancy, all production :term:`sharded clusters ` -should deploy three config servers processes on three different -machines. - -Do not use only a single config server for production deployments. -Only use a single config server deployments for testing. You should -upgrade to three config servers immediately if you are shifting to -production. The following process shows how to convert a test -deployment with only one config server to production deployment with -three config servers. - -#. Shut down all existing MongoDB processes. This includes: - - - all :program:`mongod` instances or :term:`replica sets ` - that provide your shards. - - - the :program:`mongod` instance that provides your existing config - database. - - - all :program:`mongos` instances in your cluster. - -#. Copy the entire :setting:`dbpath` file system tree from the - existing config server to the two machines that will provide the - additional config servers. These commands, issued on the system - with the existing :ref:`config-database`, ``mongo-config0.example.net`` may - resemble the following: - - .. code-block:: sh - - rsync -az /data/configdb mongo-config1.example.net:/data/configdb - rsync -az /data/configdb mongo-config2.example.net:/data/configdb - -#. Start all three config servers, using the same invocation that you - used for the single config server. - - .. code-block:: sh - - mongod --configsvr - -#. Restart all shard :program:`mongod` and :program:`mongos` processes. - -.. _sharding-process-config-server-migrate-same-hostname: - -Migrate Config Servers with the Same Hostname -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Use this process when you need to migrate a config server to a new -system but the new system will be accessible using the same host -name. - -#. Shut down the config server that you're moving. - - This will render all config data for your cluster :ref:`read only - `. - -#. Change the DNS entry that points to the system that provided the old - config server, so that the *same* hostname points to the new - system. - - How you do this depends on how you organize your DNS and - hostname resolution services. - -#. Move the entire :setting:`dbpath` file system tree from the old - config server to the new config server. This command, issued on the - old config server system, may resemble the following: - - .. code-block:: sh - - rsync -az /data/configdb mongo-config0.example.net:/data/configdb - -#. Start the config instance on the new system. The default invocation - is: - - .. code-block:: sh - - mongod --configsvr - -When you start the third config server, your cluster will become -writable and it will be able to create new splits and migrate chunks -as needed. - -.. _sharding-process-config-server-migrate-different-hostname: - -Migrate Config Servers with Different Hostnames -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Use this process when you need to migrate a :ref:`config-database` to a new -server and it *will not* be accessible via the same hostname. If -possible, avoid changing the hostname so that you can use the -:ref:`previous procedure `. - -#. Shut down the :ref:`config server ` you're moving. - - This will render all config data for your cluster "read only:" - - .. code-block:: sh - - rsync -az /data/configdb mongodb.config2.example.net:/data/configdb - -#. Start the config instance on the new system. The default invocation - is: - - .. code-block:: sh - - mongod --configsvr - -#. Shut down all existing MongoDB processes. This includes: - - - all :program:`mongod` instances or :term:`replica sets ` - that provide your shards. - - - the :program:`mongod` instances that provide your existing - :ref:`config databases `. - - - all :program:`mongos` instances in your cluster. - -#. Restart all :program:`mongod` processes that provide the shard - servers. - -#. Update the :option:`--configdb ` parameter (or - :setting:`configdb`) for all :program:`mongos` instances and - restart all :program:`mongos` instances. - -Replace a Config Server -~~~~~~~~~~~~~~~~~~~~~~~ - -Use this procedure only if you need to replace one of your config -servers after it becomes inoperable (e.g. hardware failure.) This -process assumes that the hostname of the instance will not change. If -you must change the hostname of the instance, use the process for -:ref:`migrating a config server to a different hostname -`. - -#. Provision a new system, with the same hostname as the previous - host. - - You will have to ensure that the new system has the same IP address - and hostname as the system it's replacing *or* you will need to - modify the DNS records and wait for them to propagate. - -#. Shut down *one* (and only one) of the existing config servers. Copy - all this host's :setting:`dbpath` file system tree from the current system - to the system that will provide the new config server. This - command, issued on the system with the data files, may resemble the - following: - - .. code-block:: sh - - rsync -az /data/configdb mongodb.config2.example.net:/data/configdb - -#. Restart the config server process that you used in the previous - step to copy the data files to the new config server instance. - -#. Start the new config server instance. The default invocation is: - - .. code-block:: sh - - mongod --configsvr - -.. note:: - - In the course of this procedure *never* remove a config server from - the :setting:`configdb` parameter on any of the :program:`mongos` - instances. If you need to change the name of a config server, - always make sure that all :program:`mongos` instances have three - config servers specified in the :setting:`configdb` setting at all - times. - -Backup Cluster Metadata -~~~~~~~~~~~~~~~~~~~~~~~ - -The cluster will remain operational [#read-only]_ without one -of the config database's :program:`mongod` instances, creating a backup -of the cluster metadata from the config database is straight forward: - -#. Shut down one of the :term:`config databases `. - -#. Create a full copy of the data files (i.e. the path specified by - the :setting:`dbpath` option for the config instance.) - -#. Restart the original configuration server. - -.. seealso:: :doc:`backups`. - -.. [#read-only] While one of the three config servers is unavailable, - the cluster cannot split any chunks nor can it migrate chunks - between shards. Your application will be able to write data to the - cluster. The :ref:`sharding-config-server` section of the - documentation provides more information on this topic. - -.. index:: troubleshooting; sharding -.. index:: sharding; troubleshooting -.. _sharding-troubleshooting: - -Troubleshooting ---------------- - -The two most important factors in maintaining a successful sharded cluster are: - -- :ref:`choosing an appropriate shard key ` and - -- :ref:`sufficient capacity to support current and future operations - `. - -You can prevent most issues encountered with sharding by ensuring that -you choose the best possible :term:`shard key` for your deployment and -ensure that you are always adding additional capacity to your cluster -well before the current resources become saturated. Continue reading -for specific issues you may encounter in a production environment. - -.. _sharding-troubleshooting-not-splitting: - -All Data Remains on One Shard -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Your cluster must have sufficient data for sharding to make -sense. Sharding works by migrating chunks between the shards until -each shard has roughly the same number of chunks. - -The default chunk size is 64 megabytes. MongoDB will not begin -migrations until the imbalance of chunks in the cluster exceeds the -:ref:`migration threshold `. While the -default chunk size is configurable with the :setting:`chunkSize` -setting, these behaviors help prevent unnecessary chunk migrations, -which can degrade the performance of your cluster as a whole. - -If you have just deployed a sharded cluster, make sure that you have -enough data to make sharding effective. If you do not have sufficient -data to create more than eight 64 megabyte chunks, then all data will -remain on one shard. Either lower the :ref:`chunk size -` setting, or add more data to the cluster. - -As a related problem, the system will split chunks only on -inserts or updates, which means that if you configure sharding and do not -continue to issue insert and update operations, the database will not -create any chunks. You can either wait until your application inserts -data *or* :ref:`split chunks manually `. - -Finally, if your shard key has a low :ref:`cardinality -`, MongoDB may not be able to create -sufficient splits among the data. - -One Shard Receives Too Much Traffic -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In some situations, a single shard or a subset of the cluster will -receive a disproportionate portion of the traffic and workload. In -almost all cases this is the result of a shard key that does not -effectively allow :ref:`write scaling `. - -It's also possible that you have "hot chunks." In this case, you may -be able to solve the problem by splitting and then migrating parts of -these chunks. - -In the worst case, you may have to consider re-sharding your data -and :ref:`choosing a different shard key ` -to correct this pattern. - -The Cluster Does Not Balance -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If you have just deployed your sharded cluster, you may want to -consider the :ref:`troubleshooting suggestions for a new cluster where -data remains on a single shard `. - -If the cluster was initially balanced, but later developed an uneven -distribution of data, consider the following possible causes: - -- You have deleted or removed a significant amount of data from the - cluster. If you have added additional data, it may have a - different distribution with regards to its shard key. - -- Your :term:`shard key` has low :ref:`cardinality ` - and MongoDB cannot split the chunks any further. - -- Your data set is growing faster than the balancer can distribute - data around the cluster. This is uncommon and - typically is the result of: - - - a :ref:`balancing window ` that - is too short, given the rate of data growth. - - - an uneven distribution of :ref:`write operations - ` that requires more data - migration. You may have to choose a different shard key to resolve - this issue. - - - poor network connectivity between shards, which may lead to chunk - migrations that take too long to complete. Investigate your - network configuration and interconnections between shards. - -Migrations Render Cluster Unusable -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If migrations impact your cluster or application's performance, -consider the following options, depending on the nature of the impact: - -#. If migrations only interrupt your clusters sporadically, you can - limit the :ref:`balancing window - ` to prevent balancing activity - during peak hours. Ensure that there is enough time remaining to - keep the data from becoming out of balance again. - -#. If the balancer is always migrating chunks to the detriment of - overall cluster performance: - - - You may want to attempt :ref:`decreasing the chunk size ` - to limit the size of the migration. - - - Your cluster may be over capacity, and you may want to attempt to - :ref:`add one or two shards ` to - the cluster to distribute load. - -It's also possible that your shard key causes your -application to direct all writes to a single shard. This kind of -activity pattern can require the balancer to migrate most data soon after writing -it. Consider redeploying your cluster with a shard key that provides -better :ref:`write scaling `. - -Disable Balancing During Backups -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If MongoDB migrates a :term:`chunk` during a :doc:`backup -`, you can end with an inconsistent snapshot -of your :term:`sharded cluster`. Never run a backup while the balancer is -active. To ensure that the balancer is inactive during your backup -operation: - -- Set the :ref:`balancing window ` - so that the balancer is inactive during the backup. Ensure that the - backup can complete while you have the balancer disabled. - -- :ref:`manually disable the balancer ` - for the duration of the backup procedure. - -Confirm that the balancer is not active using the -:method:`sh.getBalancerState()` method before starting a backup -operation. When the backup procedure is complete you can reactivate -the balancer process. +This page is no longer used. \ No newline at end of file diff --git a/source/core/sharding-requirements.txt b/source/core/sharding-requirements.txt new file mode 100644 index 00000000000..b36bb67aec1 --- /dev/null +++ b/source/core/sharding-requirements.txt @@ -0,0 +1,162 @@ +.. index:: sharding; requirements +.. _sharding-requirements: + +============================ +Sharded Cluster Requirements +============================ + +.. default-domain:: mongodb + +This page includes the following: + +- :ref:`sharding-requirements-when-to-use-sharding` + +- :ref:`sharding-requirements-infrastructure` + +- :ref:`sharding-requirements-data` + +- :ref:`sharding-localhost` + +.. _sharding-requirements-when-to-use-sharding: + +When to Use Sharding +-------------------- + +While sharding is a powerful and compelling feature, it comes with +significant :ref:`sharding-requirements-infrastructure` +and some limited complexity costs. As a result, use +sharding only as necessary, and when indicated by actual operational +requirements. Consider the following overview of indications it may be +time to consider sharding. + +You should consider deploying a :term:`sharded cluster`, if: + +- your data set approaches or exceeds the storage capacity of a single + node in your system. + +- the size of your system's active :term:`working set` *will soon* + exceed the capacity of the *maximum* amount of RAM for your system. + +- your system has a large amount of write activity, a single + MongoDB instance cannot write data fast enough to meet demand, and + all other approaches have not reduced contention. + +If these attributes are not present in your system, sharding will only +add additional complexity to your system without providing much +benefit. When designing your data model, if you will eventually need a +sharded cluster, consider which collections you will want to shard and +the corresponding shard keys. + +.. _sharding-capacity-planning: + +.. warning:: + + It takes time and resources to deploy sharding, and if your system + has *already* reached or exceeded its capacity, you will have a + difficult time deploying sharding without impacting your + application. + + As a result, if you think you will need to partition your database + in the future, **do not** wait until your system is overcapacity to + enable sharding. + +.. _sharding-requirements-infrastructure: + +Infrastructure Requirements +--------------------------- + +A :term:`sharded cluster` has the following components: + +- Three :term:`config servers `. + + These special :program:`mongod` instances store the metadata for the + cluster. The :program:`mongos` instances cache this data and use it + to determine which :term:`shard` is responsible for which + :term:`chunk`. + + For development and testing purposes you may deploy a cluster with a single + configuration server process, but always use exactly three config + servers for redundancy and safety in production. + +- Two or more shards. Each shard consists of one or more :program:`mongod` + instances that store the data for the shard. + + These "normal" :program:`mongod` instances hold all of the + actual data for the cluster. + + Typically each shard is a :term:`replica sets `. Each + replica set consists of multiple :program:`mongod` instances. The members + of the replica set provide redundancy and high available for the data in each shard. + + .. warning:: + + MongoDB enables data :term:`partitioning `, or + sharding, on a *per collection* basis. You *must* access all data + in a sharded cluster via the :program:`mongos` instances as below. + If you connect directly to a :program:`mongod` in a sharded cluster + you will see its fraction of the cluster's data. The data on any + given shard may be somewhat random: MongoDB provides no guarantee + that any two contiguous chunks will reside on a single shard. + +- One or more :program:`mongos` instances. + + These instance direct queries from the application layer to the + shards that hold the data. The :program:`mongos` instances have no + persistent state or data files and only cache metadata in RAM from + the config servers. + + .. note:: + + In most situations :program:`mongos` instances use minimal + resources, and you can run them on your application servers + without impacting application performance. However, if you use + the :term:`aggregation framework` some processing may occur on + the :program:`mongos` instances, causing that :program:`mongos` + to require more system resources. + +.. _sharding-requirements-data: + +Data Requirements +----------------- + +Your cluster must manage a significant quantity of data for sharding +to have an effect on your collection. The default :term:`chunk` size +is 64 megabytes, and the :ref:`balancer +` will not begin moving data until the imbalance +of chunks in the cluster exceeds the :ref:`migration threshold +`. + +Practically, this means that unless your cluster has many hundreds of +megabytes of data, chunks will remain on a single shard. + +While there are some exceptional situations where you may need to +shard a small collection of data, most of the time the additional +complexity added by sharding the small collection is not worth the additional +complexity and overhead unless +you need additional concurrency or capacity for some reason. If you +have a small data set, usually a properly configured +single MongoDB instance or replica set will be more than sufficient +for your persistence layer needs. + +:term:`Chunk ` size is :option:`user configurable `. +However, the default value is of 64 megabytes is ideal +for most deployments. See the :ref:`sharding-chunk-size` section in the +:doc:`sharding-internals` document for more information. + +.. index:: sharding; localhost +.. _sharding-localhost: + +Restrictions on the Use of "localhost" +-------------------------------------- + +Because all components of a :term:`sharded cluster` must communicate +with each other over the network, there are special restrictions +regarding the use of localhost addresses: + +If you use either "localhost" or "``127.0.0.1``" as the host +identifier, then you must use "localhost" or "``127.0.0.1``" for *all* +host settings for any MongoDB instances in the cluster. This applies +to both the ``host`` argument to :dbcommand:`addShard` and the value +to the :option:`mongos --configdb` run time option. If you mix +localhost addresses with remote host address, MongoDB will produce +errors. diff --git a/source/core/sharding-security.txt b/source/core/sharding-security.txt new file mode 100644 index 00000000000..836047ef863 --- /dev/null +++ b/source/core/sharding-security.txt @@ -0,0 +1,63 @@ +.. index:: sharding; security +.. _sharding-security: + +======================== +Sharded Cluster Security +======================== + +.. default-domain:: mongodb + +.. todo:: migrate this content to /administration/security.txt when that + document exists. See DOCS-79 for tracking this document. + +.. note:: + + You should always run all :program:`mongod` components in trusted + networking environments that control access to the cluster using + network rules and restrictions to ensure that only known traffic + reaches your :program:`mongod` and :program:`mongos` instances. + +.. warning:: Limitations + + .. versionchanged:: 2.2 + Read only authentication is fully supported in shard + clusters. Previously, in version 2.0, sharded clusters would not + enforce read-only limitations. + + .. versionchanged:: 2.0 + Sharded clusters support authentication. Previously, in version + 1.8, sharded clusters will not support authentication and access + control. You must run your sharded systems in trusted + environments. + +To control access to a sharded cluster, you must set the +:setting:`keyFile` option on all components of the sharded cluster. Use +the :option:`--keyFile ` run-time option or the +:setting:`keyFile` configuration option for all :program:`mongos`, +configuration instances, and shard :program:`mongod` instances. + +There are two classes of security credentials in a sharded cluster: +credentials for "admin" users (i.e. for the :term:`admin database`) and +credentials for all other databases. These credentials reside in +different locations within the cluster and have different roles: + +- Admin database credentials reside on the config servers, to receive + admin access to the cluster you *must* authenticate a session while + connected to a :program:`mongos` instance using the :term:`admin + database`. + +- Other database credentials reside on the *primary* shard for the + database. + +This means that you *can* authenticate to these users and databases +while connected directly to the primary shard for a database. However, +for clarity and consistency all interactions between the client and +the database should use a :program:`mongos` instance. + +.. note:: + + Individual shards can store administrative credentials to their + instance, which only permit access to a single shard. MongoDB + stores these credentials in the shards' :term:`admin databases ` and these + credentials are *completely* distinct from the cluster-wide + administrative credentials. diff --git a/source/core/sharding.txt b/source/core/sharding.txt index a06e4cba27d..6d93837af13 100644 --- a/source/core/sharding.txt +++ b/source/core/sharding.txt @@ -1,254 +1,61 @@ .. index:: fundamentals; sharding .. _sharding-fundamentals: -===================== -Sharding Fundamentals -===================== - -.. default-domain:: mongodb - -This document provides an overview of the fundamental concepts and -operations of sharding with MongoDB. For a list of all sharding -documentation see :doc:`/sharding`. - -MongoDB's sharding system allows users to :term:`partition` a -:term:`collection` within a database to distribute the collection's documents -across a number of :program:`mongod` instances or :term:`shards `. -Sharding increases write capacity, provides the ability to -support larger working sets, and raises the limits of total data size beyond -the physical resources of a single node. - +================= Sharding Overview ------------------ - -Features -~~~~~~~~ - -With sharding MongoDB automatically distributes data among a -collection of :program:`mongod` instances. Sharding, as implemented in -MongoDB has the following features: - -.. glossary:: - - Range-based Data Partitioning - MongoDB distributes documents among :term:`shards ` based - on the value of the :ref:`shard key `. Each - :term:`chunk` represents a block of :term:`documents ` - with values that fall within a specific range. When chunks grow - beyond the :ref:`chunk size `, MongoDB - divides the chunks into smaller chunks (i.e. :term:`splitting - `) based on the shard key. - - Automatic Data Volume Distribution - The sharding system automatically balances data across the - cluster without intervention from the application - layer. Effective automatic sharding depends on a well chosen - :ref:`shard key `, but requires no - additional complexity, modifications, or intervention from - developers. - - Transparent Query Routing - Sharding is completely transparent to the application layer, - because all connections to a cluster go through - :program:`mongos`. Sharding in MongoDB requires some - :ref:`basic initial configuration `, - but ongoing function is entirely transparent to the application. - - Horizontal Capacity - Sharding increases capacity in two ways: - - #. Effective partitioning of data can provide additional write - capacity by distributing the write load over a number of - :program:`mongod` instances. - - #. Given a shard key with sufficient :ref:`cardinality - `, partitioning data allows - users to increase the potential amount of data to manage - with MongoDB and expand the :term:`working set`. - -A typical :term:`sharded cluster` consists of: - -- 3 config servers that store metadata. The metadata maps :term:`chunks - ` to shards. - -- More than one :term:`replica sets ` that hold - data. These are the :term:`shards `. - -- A number of lightweight routing processes, called :doc:`mongos - ` instances. The :program:`mongos` process routes - operations to the correct shard based the cluster configuration. - -When to Use Sharding -~~~~~~~~~~~~~~~~~~~~ - -While sharding is a powerful and compelling feature, it comes with -significant :ref:`sharding-requirements-infrastructure` -and some limited complexity costs. As a result, use -sharding only as necessary, and when indicated by actual operational -requirements. Consider the following overview of indications it may be -time to consider sharding. - -You should consider deploying a :term:`sharded cluster`, if: - -- your data set approaches or exceeds the storage capacity of a single - node in your system. - -- the size of your system's active :term:`working set` *will soon* - exceed the capacity of the *maximum* amount of RAM for your system. - -- your system has a large amount of write activity, a single - MongoDB instance cannot write data fast enough to meet demand, and - all other approaches have not reduced contention. - -If these attributes are not present in your system, sharding will only -add additional complexity to your system without providing much -benefit. When designing your data model, if you will eventually need a -sharded cluster, consider which collections you will want to shard and -the corresponding shard keys. - -.. _sharding-capacity-planning: - -.. warning:: - - It takes time and resources to deploy sharding, and if your system - has *already* reached or exceeded its capacity, you will have a - difficult time deploying sharding without impacting your - application. - - As a result, if you think you will need to partition your database - in the future, **do not** wait until your system is overcapacity to - enable sharding. - -.. index:: sharding; requirements -.. _sharding-requirements: - -Sharding Requirements ---------------------- - -.. _sharding-requirements-infrastructure: - -Infrastructure Requirements -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -A :term:`sharded cluster` has the following components: - -- Three :term:`config servers `. - - These special :program:`mongod` instances store the metadata for the - cluster. The :program:`mongos` instances cache this data and use it - to determine which :term:`shard` is responsible for which - :term:`chunk`. +================= - For development and testing purposes you may deploy a cluster with a single - configuration server process, but always use exactly three config - servers for redundancy and safety in production. - -- Two or more shards. Each shard consists of one or more :program:`mongod` - instances that store the data for the shard. - - These "normal" :program:`mongod` instances hold all of the - actual data for the cluster. - - Typically each shard is a :term:`replica sets `. Each - replica set consists of multiple :program:`mongod` instances. The members - of the replica set provide redundancy and high available for the data in each shard. - - .. warning:: - - MongoDB enables data :term:`partitioning `, or - sharding, on a *per collection* basis. You *must* access all data - in a sharded cluster via the :program:`mongos` instances as below. - If you connect directly to a :program:`mongod` in a sharded cluster - you will see its fraction of the cluster's data. The data on any - given shard may be somewhat random: MongoDB provides no guarantee - that any two contiguous chunks will reside on a single shard. - -- One or more :program:`mongos` instances. - - These instance direct queries from the application layer to the - shards that hold the data. The :program:`mongos` instances have no - persistent state or data files and only cache metadata in RAM from - the config servers. - - .. note:: - - In most situations :program:`mongos` instances use minimal - resources, and you can run them on your application servers - without impacting application performance. However, if you use - the :term:`aggregation framework` some processing may occur on - the :program:`mongos` instances, causing that :program:`mongos` - to require more system resources. - -Data Requirements -~~~~~~~~~~~~~~~~~ +.. default-domain:: mongodb -Your cluster must manage a significant quantity of data for sharding -to have an effect on your collection. The default :term:`chunk` size -is 64 megabytes, [#chunk-size]_ and the :ref:`balancer -` will not begin moving data until the imbalance -of chunks in the cluster exceeds the :ref:`migration threshold -`. +Sharding is MongoDB’s approach to scaling out. Sharding partitions a +collection and stores the different portions on different machines. When +a database's collections become too large for existing storage, you need only add a +new machine. Sharding automatically distributes collection data to +the new server. -Practically, this means that unless your cluster has many hundreds of -megabytes of data, chunks will remain on a single shard. +Sharding automatically balances data and load across machines. Sharding +provides additional write capacity by distributing the write load over a +number of :program:`mongod` instances. Sharding allows users to increase +the potential amount of data in the :term:`working set`. -While there are some exceptional situations where you may need to -shard a small collection of data, most of the time the additional -complexity added by sharding the small collection is not worth the additional -complexity and overhead unless -you need additional concurrency or capacity for some reason. If you -have a small data set, usually a properly configured -single MongoDB instance or replica set will be more than sufficient -for your persistence layer needs. +.. index:: shard key + single: sharding; shard key -.. [#chunk-size] :term:`chunk` size is :option:`user configurable - `. However, the default value is of 64 - megabytes is ideal for most deployments. See the - :ref:`sharding-chunk-size` section in the - :doc:`sharding-internals` document for more information. +How Sharding Works +------------------ -.. index:: sharding; localhost -.. _sharding-localhost: +To run sharding, you set up a sharded cluster. For a description of +sharded clusters, see :doc:`/administration/sharded-clusters`. -Sharding and "localhost" Addresses -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Within a sharded cluster, you enable sharding on a per-database basis. +Once a database is enabled for sharding, you choose which collections to +shard. For each sharded collection, you specify a :term:`shard key`. -Because all components of a :term:`sharded cluster` must communicate -with each other over the network, there are special restrictions -regarding the use of localhost addresses: +The shard key determines the distribution of the collection's +:term:`documents ` among the cluster's :term:`shards `. +The shard key is a :term:`field` that exists in every document in the +collection. MongoDB distributes documents according to ranges of values +in the shard key. A given shard holds documents for which the shard key +falls within a specific range of values. Shard keys, like :term:`indexes +`, can be either a single field or multiple fields. -If you use either "localhost" or "``127.0.0.1``" as the host -identifier, then you must use "localhost" or "``127.0.0.1``" for *all* -host settings for any MongoDB instances in the cluster. This applies -to both the ``host`` argument to :dbcommand:`addShard` and the value -to the :option:`mongos --configdb` run time option. If you mix -localhost addresses with remote host address, MongoDB will produce -errors. - -.. index:: shard key - single: sharding; shard key +Within a shard, MongoDB further partitions documents into :term:`chunks +`. Each chunk represents a smaller range of values within the +shard's range. When a chunk grows beyond the :ref:`chunk size +`, MongoDB :term:`splits ` the chunk into +smaller chunks, always based on ranges in the shard key. +.. _sharding-shard-key-selection: .. _sharding-shard-key: .. _shard-key: -Shard Keys ----------- - -.. todo:: link this section to - -"Shard keys" refer to the :term:`field` that exists in every -:term:`document` in a collection that MongoDB uses to distribute -documents among the :term:`shards `. Shard keys, like -:term:`indexes `, can be either a single field, or may be a -compound key, consisting of multiple fields. +Shard Key Selection +------------------- -Remember, MongoDB's sharding is range-based: each :term:`chunk` holds -documents having specific range of values for the "shard key". Thus, -choosing the correct shard key can have a great impact on the +Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster. - -Appropriate shard key choice depends on the schema of your data and -the way that your application queries and writes data to the database. +Appropriate shard key choice depends on the schema of your data and the +way that your application queries and writes data to the database. The ideal shard key: @@ -278,265 +85,19 @@ the optimal key. In those situations, computing a special purpose shard key into an additional field or using a compound shard key may help produce one that is more ideal. -.. index:: sharding; config servers -.. index:: config servers -.. _sharding-config-server: - -Config Servers --------------- - -Config servers maintain the shard metadata in a config -database. The :term:`config database ` stores -the relationship between :term:`chunks ` and where they reside -within a :term:`sharded cluster`. Without a config database, the -:program:`mongos` instances would be unable to route queries or write -operations within the cluster. - -Config servers *do not* run as replica sets. Instead, a :term:`cluster -` operates with a group of *three* config servers that use a -two-phase commit process that ensures immediate consistency and -reliability. - -For testing purposes you may deploy a cluster with a single -config server, but this is not recommended for production. - -.. warning:: - - If your cluster has a single config server, this - :program:`mongod` is a single point of failure. If the instance is - inaccessible the cluster is not accessible. If you cannot recover - the data on a config server, the cluster will be inoperable. - - **Always** use three config servers for production deployments. - -The actual load on configuration servers is small because each -:program:`mongos` instances maintains a cached copy of the configuration -database. MongoDB only writes data to the config server to: - -- create splits in existing chunks, which happens as data in - existing chunks exceeds the maximum chunk size. - -- migrate a chunk between shards. - -Additionally, all config servers must be available on initial setup -of a sharded cluster, each :program:`mongos` instance must be able -to write to the ``config.version`` collection. - -If one or two configuration instances become unavailable, the -cluster's metadata becomes *read only*. It is still possible to read -and write data from the shards, but no chunk migrations or splits will -occur until all three servers are accessible. At the same time, config -server data is only read in the following situations: - -- A new :program:`mongos` starts for the first time, or an existing - :program:`mongos` restarts. - -- After a chunk migration, the :program:`mongos` instances update - themselves with the new cluster metadata. - -If all three config servers are inaccessible, you can continue to use -the cluster as long as you don't restart the :program:`mongos` -instances until after config servers are accessible again. If you -restart the :program:`mongos` instances and there are no accessible -config servers, the :program:`mongos` would be unable to direct -queries or write operations to the cluster. - -Because the configuration data is small relative to the amount of data -stored in a cluster, the amount of activity is relatively low, and 100% -up time is not required for a functioning sharded cluster. As a result, -backing up the config servers is not difficult. Backups of config -servers are critical as clusters become totally inoperable when -you lose all configuration instances and data. Precautions to ensure -that the config servers remain available and intact are critical. - -.. note:: - - Configuration servers store metadata for a single sharded cluster. - You must have a separate configuration server or servers for each - cluster you administer. - -.. index:: mongos -.. _sharding-mongos: -.. _sharding-read-operations: - -:program:`mongos` and Querying ------------------------------- - -.. seealso:: :doc:`/reference/mongos` and the :program:`mongos`\-only - settings: :setting:`test` and :setting:`chunkSize`. - -Operations -~~~~~~~~~~ - -The :program:`mongos` provides a single unified interface to a sharded -cluster for applications using MongoDB. Except for the selection of a -:term:`shard key`, application developers and administrators need not -consider any of the :doc:`internal details of sharding `. - -:program:`mongos` caches data from the :ref:`config server -`, and uses this to route operations from -applications and clients to the :program:`mongod` instances. -:program:`mongos` have no *persistent* state and consume -minimal system resources. - -The most common practice is to run :program:`mongos` instances on the -same systems as your application servers, but you can maintain -:program:`mongos` instances on the shards or on other dedicated -resources. - -.. note:: - - .. versionchanged:: 2.1 - - Some aggregation operations using the :dbcommand:`aggregate` - command (i.e. :method:`db.collection.aggregate()`,) will cause - :program:`mongos` instances to require more CPU resources than in - previous versions. This modified performance profile may dictate - alternate architecture decisions if you use the :term:`aggregation - framework` extensively in a sharded environment. - -.. _sharding-query-routing: - -Routing -~~~~~~~ - -:program:`mongos` uses information from :ref:`config servers -` to route operations to the cluster as -efficiently as possible. In general, operations in a sharded -environment are either: - -1. Targeted at a single shard or a limited group of shards based on - the shard key. - -2. Broadcast to all shards in the cluster that hold documents in a - collection. - -When possible you should design your operations to be as targeted as -possible. Operations have the following targeting characteristics: - -- Query operations broadcast to all shards [#namespace-exception]_ - **unless** the :program:`mongos` can determine which shard or shard - stores this data. - - For queries that include the shard key, :program:`mongos` can target - the query at a specific shard or set of shards, if the portion - of the shard key included in the query is a *prefix* of the shard - key. For example, if the shard key is: - - .. code-block:: javascript - - { a: 1, b: 1, c: 1 } - - The :program:`mongos` *can* route queries that include the full - shard key or either of the following shard key prefixes at a - specific shard or set of shards: - - .. code-block:: javascript - - { a: 1 } - { a: 1, b: 1 } - - Depending on the distribution of data in the cluster and the - selectivity of the query, :program:`mongos` may still have to - contact multiple shards [#possible-all]_ to fulfill these queries. - -- All :method:`insert() ` operations target to - one shard. - -- All single :method:`update() ` operations - target to one shard. This includes :term:`upsert` operations. - -- The :program:`mongos` broadcasts multi-update operations to every - shard. - -- The :program:`mongos` broadcasts :method:`remove() - ` operations to every shard unless the - operation specifies the shard key in full. - -While some operations must broadcast to all shards, you can improve -performance by using as many targeted operations as possible by -ensuring that your operations include the shard key. - -.. [#namespace-exception] If a shard does not store chunks from a - given collection, queries for documents in that collection are not - broadcast to that shard. - -.. [#a/c-as-a-case-of-a] In this example, a :program:`mongos` could - route a query that included ``{ a: 1, c: 1 }`` fields at a specific - subset of shards using the ``{ a: 1 }`` prefix. A :program:`mongos` - cannot route any of the following queries to specific shards - in the cluster: - - .. code-block:: javascript - - { b: 1 } - { c: 1 } - { b: 1, c: 1 } - -.. [#possible-all] :program:`mongos` will route some queries, even - some that include the shard key, to all shards, if needed. - -Sharded Query Response Process -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To route a query to a :term:`cluster `, -:program:`mongos` uses the following process: - -#. Determine the list of :term:`shards ` that must receive the query. - - In some cases, when the :term:`shard key` or a prefix of the shard - key is a part of the query, the :program:`mongos` can route the - query to a subset of the shards. Otherwise, the :program:`mongos` - must direct the query to *all* shards that hold documents for that - collection. - - .. example:: - - Given the following shard key: - - .. code-block:: javascript - - { zipcode: 1, u_id: 1, c_date: 1 } - - Depending on the distribution of chunks in the cluster, the - :program:`mongos` may be able to target the query at a subset of - shards, if the query contains the following fields: - - .. code-block:: javascript - - { zipcode: 1 } - { zipcode: 1, u_id: 1 } - { zipcode: 1, u_id: 1, c_date: 1 } - -#. Establish a cursor on all targeted shards. - - When the first batch of results returns from the cursors: - - a. For query with sorted results (i.e. using - :method:`cursor.sort()`) the :program:`mongos` performs a merge - sort of all queries. - - b. For a query with unsorted results, the :program:`mongos` returns - a result cursor that "round robins" results from all cursors on - the shards. - - .. versionchanged:: 2.0.5 - Before 2.0.5, the :program:`mongos` exhausted each cursor, - one by one. - .. index:: balancing .. _sharding-balancing: -Balancing and Distribution --------------------------- +Shard Balancing +--------------- Balancing is the process MongoDB uses to redistribute data within a :term:`sharded cluster`. When a :term:`shard` has a too many: -term:`chunks ` when compared to other shards, MongoDB balances -the shards. +term:`chunks ` when compared to other shards, MongoDB +automatically balances the shards. MongoDB balances the shards without +intervention from the application layer. -The -balancing process attempts to minimize the impact that balancing can +The balancing process attempts to minimize the impact that balancing can have on the cluster, by: - Moving only one chunk at a time. @@ -559,64 +120,3 @@ balancing process from impacting production traffic. is entirely transparent to the user and application layer. This documentation is only included for your edification and possible troubleshooting purposes. - -.. index:: sharding; security -.. _sharding-security: - -Security Considerations for Sharded Clusters --------------------------------------------- - -.. todo:: migrate this content to /administration/security.txt when that - document exists. See DOCS-79 for tracking this document. - -.. note:: - - You should always run all :program:`mongod` components in trusted - networking environments that control access to the cluster using - network rules and restrictions to ensure that only known traffic - reaches your :program:`mongod` and :program:`mongos` instances. - -.. warning:: Limitations - - .. versionchanged:: 2.2 - Read only authentication is fully supported in shard - clusters. Previously, in version 2.0, sharded clusters would not - enforce read-only limitations. - - .. versionchanged:: 2.0 - Sharded clusters support authentication. Previously, in version - 1.8, sharded clusters will not support authentication and access - control. You must run your sharded systems in trusted - environments. - -To control access to a sharded cluster, you must set the -:setting:`keyFile` option on all components of the sharded cluster. Use -the :option:`--keyFile ` run-time option or the -:setting:`keyFile` configuration option for all :program:`mongos`, -configuration instances, and shard :program:`mongod` instances. - -There are two classes of security credentials in a sharded cluster: -credentials for "admin" users (i.e. for the :term:`admin database`) and -credentials for all other databases. These credentials reside in -different locations within the cluster and have different roles: - -#. Admin database credentials reside on the config servers, to receive - admin access to the cluster you *must* authenticate a session while - connected to a :program:`mongos` instance using the - :term:`admin database`. - -#. Other database credentials reside on the *primary* shard for the - database. - -This means that you *can* authenticate to these users and databases -while connected directly to the primary shard for a database. However, -for clarity and consistency all interactions between the client and -the database should use a :program:`mongos` instance. - -.. note:: - - Individual shards can store administrative credentials to their - instance, which only permit access to a single shard. MongoDB - stores these credentials in the shards' :term:`admin databases ` and these - credentials are *completely* distinct from the cluster-wide - administrative credentials. diff --git a/source/faq/sharding.txt b/source/faq/sharding.txt index 6592cea0370..16b4d7a2e9b 100644 --- a/source/faq/sharding.txt +++ b/source/faq/sharding.txt @@ -44,8 +44,7 @@ Can I change the shard key after sharding a collection? No. There is no automatic support in MongoDB for changing a shard key -after :ref:`sharding a collection -`. This reality underscores +after sharding a collection. This reality underscores the important of choosing a good :ref:`shard key `. If you *must* change a shard key after sharding a collection, the best option is to: diff --git a/source/reference/commands.txt b/source/reference/commands.txt index 1e1617f175a..dbfd092a486 100644 --- a/source/reference/commands.txt +++ b/source/reference/commands.txt @@ -77,35 +77,6 @@ includes the relevant :program:`mongo` shell helpers. See User Commands ------------- -.. _sharding-commands: - -Sharding Commands -~~~~~~~~~~~~~~~~~ - -.. seealso:: :doc:`/sharding` for more information about MongoDB's - sharding functionality. - -.. include:: command/addShard.txt - :start-after: mongodb - -.. include:: command/listShards.txt - :start-after: mongodb - -.. include:: command/enableSharding.txt - :start-after: mongodb - -.. include:: command/shardCollection.txt - :start-after: mongodb - -.. include:: command/shardingState.txt - :start-after: mongodb - -.. include:: command/removeShard.txt - :start-after: mongodb - -.. include:: command/printShardingStatus.txt - :start-after: mongodb - Aggregation Commands ~~~~~~~~~~~~~~~~~~~~ diff --git a/source/reference/sharding-commands.txt b/source/reference/sharding-commands.txt new file mode 100644 index 00000000000..68e3a984322 --- /dev/null +++ b/source/reference/sharding-commands.txt @@ -0,0 +1,33 @@ +.. _sharding-commands: + +================= +Sharding Commands +================= + +.. default-domain:: mongodb + +The following database commands support :term:`sharded clusters `. + +.. seealso:: :doc:`/sharding` for more information about MongoDB's + sharding functionality. + +.. include:: command/addShard.txt + :start-after: mongodb + +.. include:: command/listShards.txt + :start-after: mongodb + +.. include:: command/enableSharding.txt + :start-after: mongodb + +.. include:: command/shardCollection.txt + :start-after: mongodb + +.. include:: command/shardingState.txt + :start-after: mongodb + +.. include:: command/removeShard.txt + :start-after: mongodb + +.. include:: command/printShardingStatus.txt + :start-after: mongodb diff --git a/source/sharding.txt b/source/sharding.txt index e4c58e87a62..5e14823ea88 100644 --- a/source/sharding.txt +++ b/source/sharding.txt @@ -4,44 +4,28 @@ Sharding .. _sharding-background: -Sharding distributes a -single logical database system across a cluster of machines. Sharding -uses range-based portioning to distribute :term:`documents ` -based on a specific :term:`shard key`. +Sharding distributes a single logical database system across a cluster +of machines. Sharding uses range-based portioning to distribute +:term:`documents ` based on a specific :term:`shard key`. -This page lists the documents, tutorials, and reference pages that -describe sharding. +Sharding Concepts +----------------- -For an overview, see :doc:`/core/sharding`. To configure, maintain, and -troubleshoot sharded clusters, see :doc:`/administration/sharding`. -For deployment architectures, see :doc:`/administration/sharding-architectures`. -For details on the internal operations of sharding, see :doc:`/core/sharding-internals`. -For procedures for performing certain sharding tasks, see the -:ref:`Tutorials ` list. - -Documentation -------------- - -The following is the outline of the main documentation: +The following pages describe how sharding works and how to leverage sharding. .. toctree:: - :maxdepth: 2 + :maxdepth: 1 - core/sharding - administration/sharding + core/sharding + administration/sharded-clusters + core/sharding-requirements administration/sharding-architectures - core/sharding-internals + core/sharding-security -.. index:: tutorials; sharding -.. _sharding-tutorials: +Set Up and Manage Sharded Clusters +---------------------------------- -Tutorials ---------- - -The following tutorials describe specific sharding procedures: - -Deploying Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~ +The following pages give procedures for setting up and maintaining sharded clusters. .. toctree:: :maxdepth: 1 @@ -49,11 +33,17 @@ Deploying Sharded Clusters tutorial/deploy-shard-cluster tutorial/add-shards-to-shard-cluster tutorial/remove-shards-from-cluster + tutorial/view-cluster-configuration + tutorial/manage-chunks + tutorial/manage-balancer + administration/sharding-config-server tutorial/enforce-unique-keys-for-sharded-collections tutorial/convert-replica-set-to-replicated-shard-cluster -Backups for Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Backup Sharded Clusters +----------------------- + +The following pages describe different approaches to backing up sharded clusters. .. toctree:: :maxdepth: 1 @@ -65,13 +55,27 @@ Backups for Sharded Clusters tutorial/restore-sharded-cluster tutorial/schedule-backup-window-for-sharded-clusters -.. _sharding-reference: +FAQs, Advanced Concepts, and Troubleshooting +-------------------------------------------- + +The following pages provide information for troubleshooting sharded clusters and for a deeper understanding of sharding. + +.. toctree:: + :maxdepth: 1 + + faq/sharding + core/sharding-internals + administration/sharding-troubleshooting Reference --------- -- :ref:`sharding-commands` -- :doc:`/faq/sharding` +The following pages list the methods and commands used in sharding. + +.. toctree:: + :maxdepth: 1 + + reference/sharding-commands .. STUB tutorial/replace-one-configuration-server-in-a-shard-cluster .. STUB tutorial/replace-all-configuration-servers-in-a-shard-cluster diff --git a/source/tutorial/add-shards-to-shard-cluster.txt b/source/tutorial/add-shards-to-shard-cluster.txt index 8ed6bbc1e6e..a89e1b7df04 100644 --- a/source/tutorial/add-shards-to-shard-cluster.txt +++ b/source/tutorial/add-shards-to-shard-cluster.txt @@ -1,119 +1,79 @@ -================================= -Add Shards to an Existing Cluster -================================= +======================= +Add Shards to a Cluster +======================= .. default-domain:: mongodb -Synopsis --------- +You add shards to a :term:`sharded cluster` after you create the cluster +or anytime that you need to add capacity to the cluster. If you have not +created a sharded cluster, see :ref:`sharding-procedure-setup`. -This document describes how to add a :term:`shard` to an -existing :term:`sharded cluster`. As your -data set grows you must add additional shards to a cluster to provide -additional capacity. For additional sharding -procedures, see :doc:`/administration/sharding`. - -Concerns --------- - -Distributing :term:`chunks ` among your cluster requires some -capacity to support the migration process. When adding a shard to your -cluster, you should always ensure that your cluster has enough -capacity to support the migration without affecting legitimate -production traffic. +When adding a shard to a cluster, you should always ensure that the +cluster has enough capacity to support the migration without affecting +legitimate production traffic. In production environments, all shards should be :term:`replica sets -`. Furthermore, *all* interaction with your sharded -cluster should pass through a :program:`mongos` instance. This -tutorial assumes that you already have a :program:`mongo` shell -connection to a :program:`mongos` instance. - -Process -------- - -Tell the cluster where to find the individual -shards. You can do this using the :dbcommand:`addShard` command: - -.. code-block:: javascript - - db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) - -Or you can use the :method:`sh.addShard()` helper in the -:program:`mongo` shell: - -.. code-block:: javascript - - sh.addShard( "[hostname]:[port]" ) +`. -Replace ``[hostname]`` and ``[port]`` with the hostname and TCP -port number of where the shard is accessible. +Add a Shard to a Cluster +------------------------ -.. warning:: +You interact with a sharded cluster by connecting to a :program:`mongos` +instance. - Do not use ``localhost`` for the hostname unless your - :term:`configuration server ` - is also running on ``localhost``. +1. From a :program:`mongo` shell, connect to the :program:`mongos` + instance. Issue a command using the following syntax: -For example: - -.. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - -If ``mongodb0.example.net:27027`` is a member of a replica -set, call the :method:`sh.addShard()` method with an argument that -resembles the following: - -.. code-block:: javascript - - sh.addShard( "/mongodb0.example.net:27027" ) - -Replace, ```` with the name of the replica set, and -MongoDB will discover all other members of the replica set. - -.. note:: In production deployments, all shards should be replica sets. + .. code-block:: sh - .. versionchanged:: 2.0.3 + mongo --host --port - Before version 2.0.3, you must specify the shard in the following - form: + For example, if a :program:`mongos` is accessible at + ``mongos0.example.net`` on port ``27017``, issue the following + command: .. code-block:: sh - replicaSetName/,, + mongo --host mongos0.example.net --port 27017 + +#. Add each shard to the cluster using the :method:`sh.addShard()` + method, as shown in the examples below. Issue :method:`sh.addShard()` + separately for each shard. If the shard is a replica set, specify the + name of the replica set and specify a member of the set. In + production deployments, all shards should be replica sets. - For example, if the name of the replica set is ``repl0``, then - your :method:`sh.addShard` command would be: + .. optional:: You can instead use the :dbcommand:`addShard` database + command, which lets you specify a name and maximum size for the + shard. If you do not specify these, MongoDB automatically assigns + a name and maximum size. To use the database command, see + :dbcommand:`addShard`. - .. code-block:: javascript + The following are examples of adding a shard with + :method:`sh.addShard()`: - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) + - To add a shard for a replica set named ``rs1`` with a member + running on port ``27017`` on ``mongodb0.example.net``, issue the + following command: -Repeat this step for each shard in your cluster. + .. code-block:: javascript -.. optional:: + sh.addShard( "rs1/mongodb0.example.net:27017" ) - You may specify a "name" as an argument to the - :dbcommand:`addShard`, follows: + .. versionchanged:: 2.0.3 - .. code-block:: javascript + For MongoDB versions prior to 2.0.3, you must specify all members of the replica set. For + example: - db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) + .. code-block:: javascript - You cannot specify a name for a shard using the - :method:`sh.addShard()` helper in the :program:`mongo` shell. If - you use the helper or do not specify a shard name, then MongoDB - will assign a name upon creation. + sh.addShard( "rs1/mongodb0.example.net:27017,mongodb1.example.net:27017,mongodb2.example.net:27017" ) -.. note:: + - To add a shard for a standalone :program:`mongod` on port ``27017`` + of ``mongodb0.example.net``, issue the following command: - It may take some time for :term:`chunks ` to migrate to the new - shard because the system must copy data from one :program:`mongod` - instance to another while maintaining data consistency. + .. code-block:: javascript - For an overview of the balancing operation, - see the :ref:`Balancing and Distribution ` - section. + sh.addShard( "mongodb0.example.net:27017" ) - For additional information on balancing, see the - :ref:`Balancing Internals ` section. + .. note:: It might take some time for :term:`chunks ` to + migrate to the new shard. diff --git a/source/tutorial/deploy-shard-cluster.txt b/source/tutorial/deploy-shard-cluster.txt index 335a5e48371..8cfecc7473a 100644 --- a/source/tutorial/deploy-shard-cluster.txt +++ b/source/tutorial/deploy-shard-cluster.txt @@ -1,248 +1,290 @@ +.. _sharding-procedure-setup: + ======================== -Deploy a Sharded Cluster +Set Up a Sharded Cluster ======================== .. default-domain:: mongodb -This document describes how to deploy a :term:`sharded cluster` for a -standalone :program:`mongod` instance. To deploy a cluster for an -existing replica set, see -:doc:`/tutorial/convert-replica-set-to-replicated-shard-cluster`. +The topics on this page are an ordered sequence of the tasks you perform +to set up a :term:`sharded cluster`. + +Before setting up a cluster, see the following: + +- :ref:`sharding-requirements`. + +- :doc:`/administration/replication-architectures` + +To set up a sharded cluster, follow ordered sequence of the tasks on +this page: -Procedure ---------- +1. :ref:`sharding-setup-start-cfgsrvr` -Before deploying a sharded cluster, see the requirements listed in -:ref:`Requirements for Sharded Clusters `. +#. :ref:`sharding-setup-start-mongos` + +#. :ref:`sharding-setup-add-shards` + +#. :ref:`sharding-setup-enable-sharding` + +#. :ref:`sharding-setup-shard-collection` .. include:: /includes/warning-sharding-hostnames.rst +.. _sharding-setup-start-cfgsrvr: + Start the Config Server Database Instances -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------------------ -The config server database processes are small -:program:`mongod` instances that store the cluster's metadata. You must -have exactly *three* instances in production deployments. Each stores a complete copy -of the cluster's metadata. These instances should run on different servers -to assure good uptime and data safety. +The config server processes are :program:`mongod` instances that store +the cluster's metadata. You designate a :program:`mongod` as a config +server using the :option:`--configsvr ` option. Each +config server stores a complete copy of the cluster's metadata. -Since config database :program:`mongod` instances receive relatively -little traffic and demand only a small portion of system resources, you -can run the instances on systems that run other cluster components. +In production deployments, you must deploy exactly three config server +instances, each running on different servers to assure good uptime and +data safety. In test environments, you can run all three instances on a +single server. -By default a :program:`mongod` :option:`--configsvr ` process stores its data files -in the `/data/configdb` directory. You can specify a different -location using the :setting:`dbpath` run-time option. The config :program:`mongod` instance -is accessible via port ``27019``. In addition to :setting:`configsvr`, -use other :program:`mongod` -:doc:`runtime options ` as needed. +Config server instances receive relatively little traffic and demand +only a small portion of system resources. Therefore, you can run an +instance on a system that runs other cluster components. -To create a data directory for each config server, issue a command -similar to the following for each: +1. Create data directories for each of the three config server + instances. By default, a config server stores its data files in the + `/data/configdb` directory. You can choose a different location. To + create a data directory, issue a command similar to the following: -.. code-block:: sh + .. code-block:: sh - mkdir /data/db/config + mkdir /data/db/config -To start each config server, issue a command similar to the following -for each: +#. Start the three config server instances. Start each by issuing a + command using the following syntax: -.. code-block:: sh + .. code-block:: sh + + mongod --configsvr --dbpath --port + + The default port for config servers is ``27019``. You can specify a + different port. The following example starts a config server using + the default port and default data directory: + + .. code-block:: sh + + mongod --configsvr --dbpath /data/configdb --port 27019 - mongod --configsvr --dbpath --port + For additional command options, see :doc:`/reference/mongod` or + :doc:`/reference/configuration-options`. + + .. include:: /includes/note-config-server-startup.rst + +.. _sharding-setup-start-mongos: Start the ``mongos`` Instances -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------ -The :program:`mongos` instance routes queries and operations -to the appropriate shards and interacts with the config server instances. -All client operations targeting a cluster go through :program:`mongos` -instances. +The :program:`mongos` instances are lightweight and do not require data +directories. You can run a :program:`mongos` instance on a system that +runs other cluster components, such as on an application server or a +server running a :program:`mongod` process. By default, a +:program:`mongos` instance runs on port ``27017``. -:program:`mongos` instances are lightweight and do not require data directories. -A cluster typically -has several instances. For example, you might run one :program:`mongos` -instance on each of your application servers, or you might run a :program:`mongos` instance -on each of the servers running a :program:`mongod` process. +When you start the :program:`mongos` instance, specify the hostnames of +the three config servers, either in the configuration file or as command +line parameters. For operational flexibility, use DNS names for the +config servers rather than explicit IP addresses. If you're not using +resolvable hostname, you cannot change the config server names or IP +addresses without a restarting *every* :program:`mongos` and +:program:`mongod` instance. -You must the specify resolvable hostnames [#names]_ for the *3* config servers -when starting the :program:`mongos` instance. You specify the hostnames either in the -configuration file or as command line parameters. +To start a :program:`mongos` instance, issue a command using the following syntax: + +.. code-block:: sh -The :program:`mongos` instance runs on the default MongoDB TCP port: -``27017``. + mongos --configdb -To start :program:`mongos` instance running on the -``mongos0.example.net`` host, that connects to the config server -instances running on the following hosts: +For example, to start a :program:`mongos` that connects to config server +instance running on the following hosts and on the default ports: -- ``mongoc0.example.net`` -- ``mongoc1.example.net`` -- ``mongoc2.example.net`` +- ``cfg0.example.net`` +- ``cfg1.example.net`` +- ``cfg2.example.net`` You would issue the following command: .. code-block:: sh - mongos --configdb mongoc0.example.net,mongoc1.example.net,mongoc2.example.net + mongos --configdb cfg0.example.net:27019,cfg1.example.net:27019,cfg2.example.net:27019 -.. [#names] Use DNS names for the config servers rather than explicit - IP addresses for operational flexibility. If you're not using resolvable - hostname, - you cannot change the config server names or IP addresses - without a restarting *every* :program:`mongos` and - :program:`mongod` instance. +.. _sharding-setup-add-shards: Add Shards to the Cluster -~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------- -You must deploy at least one :term:`shard` or one :term:`replica set` to -begin. In a production cluster, each shard is a replica set. You -may add additional shards to a running cluster later. For instructions -on deploying replica sets, see :doc:`/tutorial/deploy-replica-set`. +A :term:`shard` can be a standalone :program:`mongod` or or a +:term:`replica set`. In a production environment, each shard +should be a replica set. -This procedure assumes you have two active and initiated replica sets -and describes how to add the first two shards to the cluster. +1. From a :program:`mongo` shell, connect to the :program:`mongos` + instance. Issue a command using the following syntax: -First, connect to one of the :program:`mongos` instances. For example, -if a :program:`mongos` is accessible at ``mongos0.example.net`` on -port ``27017``, issue the following command: + .. code-block:: sh -.. code-block:: sh + mongo --host --port - mongo mongos0.example.net + For example, if a :program:`mongos` is accessible at + ``mongos0.example.net`` on port ``27017``, issue the following + command: -Then, from a :program:`mongo` shell connected to the :program:`mongos` -instance, call the :method:`sh.addShard()` method for each shard that -you want to add to the cluster: + .. code-block:: sh -.. code-block:: javascript + mongo --host mongos0.example.net --port 27017 - sh.addShard( "s0/sfo30.example.net" ) - sh.addShard( "s1/sfo40.example.net" ) +#. Add each shard to the cluster using the :method:`sh.addShard()` + method, as shown in the examples below. Issue :method:`sh.addShard()` + separately for each shard. If the shard is a replica set, specify the + name of the replica set and specify a member of the set. In + production deployments, all shards should be replica sets. -If the host you are adding is a member of a replica set, you -*must* specify the name of the replica set. :program:`mongos` -will discover the names of other members of the replica set based on -the name and the hostname you provide. + .. optional:: You can instead use the :dbcommand:`addShard` database + command, which lets you specify a name and maximum size for the + shard. If you do not specify these, MongoDB automatically assigns + a name and maximum size. To use the database command, see + :dbcommand:`addShard`. -These operations add two shards, provided by: + The following are examples of adding a shard with + :method:`sh.addShard()`: -- the replica set named ``s0``, that includes the - ``sfo30.example.net`` host. + - To add a shard for a replica set named ``rs1`` with a member + running on port ``27017`` on ``mongodb0.example.net``, issue the + following command: -- the replica set name and ``s1``, that includes the - ``sfo40.example.net`` host. + .. code-block:: javascript -.. admonition:: All shards should be replica sets + sh.addShard( "rs1/mongodb0.example.net:27017" ) - .. versionchanged:: 2.0.3 + .. versionchanged:: 2.0.3 - After version 2.0.3, you may use the above form to add replica - sets to a cluster. The cluster will automatically discover - the other members of the replica set and note their names - accordingly. + For MongoDB versions prior to 2.0.3, you must specify all members of the replica set. For + example: - Before version 2.0.3, you must specify the shard in the - following form: the replica set name, followed by a forward - slash, followed by a comma-separated list of seeds for the - replica set. For example, if the name of the replica set is - ``sh0``, and the replica set were to have three members, then your :method:`sh.addShard` command might resemble: + .. code-block:: javascript - .. code-block:: javascript + sh.addShard( "rs1/mongodb0.example.net:27017,mongodb1.example.net:27017,mongodb2.example.net:27017" ) - sh.addShard( "sh0/sfo30.example.net,sfo31.example.net,sfo32.example.net" ) + - To add a shard for a standalone :program:`mongod` on port ``27017`` + of ``mongodb0.example.net``, issue the following command: -The :method:`sh.addShard()` helper in the :program:`mongo` shell is a wrapper for -the :dbcommand:`addShard` :term:`database command`. + .. code-block:: javascript -Enable Sharding for Databases -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + sh.addShard( "mongodb0.example.net:27017" ) -While sharding operates on a per-collection basis, you must enable -sharding for each database that holds collections you want to shard. A -single cluster may have many databases, with each database housing -collections. + .. note:: It might take some time for :term:`chunks ` to + migrate to the new shard. -Use the following operation in a :program:`mongo` shell session -connected to a :program:`mongos` instance in your cluster: +.. _sharding-setup-enable-sharding: -.. code-block:: javascript +Enable Sharding for a Database +------------------------------ - sh.enableSharding("records") +Before you can shard a collection, you must enable sharding for the +collection's database. Enabling sharding for a database does not +redistribute data but make it possible to shard the collections in that +database. -Where ``records`` is the name of the database that holds the collection -you want to shard. :method:`sh.enableSharding()` is a wrapper -around the :dbcommand:`enableSharding` :term:`database command`. You -can enable sharding for multiple databases in the cluster. +Once you enable sharding for a database, MongoDB assigns a +:term:`primary shard` for that database where MongoDB stores all data +before sharding begins. -Enable Sharding for Collections -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +1. From a :program:`mongo` shell, connect to the :program:`mongos` + instance. Issue a command using the following syntax: -You can enable sharding on a per-collection basis. Because -MongoDB uses "range based sharding," you must specify the :term:`shard -key` MongoDB uses to distribute your documents among the -shards. For more information, see the -:ref:`overview of shard keys `. + .. code-block:: sh -To enable sharding for a collection, use the -:method:`sh.shardCollection()` helper in the :program:`mongo` shell. -The helper provides a wrapper around the :dbcommand:`shardCollection` -:term:`database command` and has the following prototype form: + mongo --host --port -.. code-block:: javascript +#. Issue the :method:`sh.enableSharding()` method, specifying the name + of the database for which to enable sharding. Use the following syntax: - sh.shardCollection(".", shard-key-pattern) + .. code-block:: javascript -Replace the ``.`` string with the full namespace -of your database, which consists of the name of your database, a dot -(e.g. ``.``), and the full name of the collection. The ``shard-key-pattern`` -represents your shard key, which you specify in the same form as you -would an :method:`index ` key pattern. + sh.enableSharding("") -Consider the following example invocations of -:method:`sh.shardCollection()`: +Optionally, you can enable sharding for a database using the +:dbcommand:`enableSharding` command, which uses the following syntax: .. code-block:: javascript - sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) - sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) - sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) - sh.shardCollection("events.alerts", { "hashed_id": 1 } ) + db.runCommand( { enableSharding: } ) + +.. _sharding-setup-shard-collection: + +Enable Sharding for a Collection +-------------------------------- + +You enable sharding on a per-collection basis. + +1. Determine what you will use for the :term:`shard key`. Your selection + of the shard key affects the efficiency of sharding. See the + selection considerations listed in the :ref:`sharding-shard-key-selection`. + +#. Enable sharding for a collection by issuing the + :method:`sh.shardCollection()` method in the :program:`mongo` shell. + The method uses the following syntax: + + .. code-block:: javascript + + sh.shardCollection(".", shard-key-pattern) + + Replace the ``.`` string with the full + namespace of your database, which consists of the name of your + database, a dot (e.g. ``.``), and the full name of the collection. + The ``shard-key-pattern`` represents your shard key, which you + specify in the same form as you would an :method:`index + ` key pattern. + +.. example:: The following sequence of commands shards four collections: + + .. code-block:: javascript + + sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) + sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) + sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) + sh.shardCollection("events.alerts", { "hashed_id": 1 } ) -In order, these operations shard: + In order, these operations shard: -#. The ``people`` collection in the ``records`` database using the shard key - ``{ "zipcode": 1, "name": 1 }``. + #. The ``people`` collection in the ``records`` database using the + shard key ``{ "zipcode": 1, "name": 1 }``. - This shard key distributes documents by the value of the - ``zipcode`` field. If a number of documents have the same value for - this field, then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``name`` - field. + This shard key distributes documents by the value of the + ``zipcode`` field. If a number of documents have the same value + for this field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``name`` + field. -#. The ``addresses`` collection in the ``people`` database using the shard key - ``{ "state": 1, "_id": 1 }``. + #. The ``addresses`` collection in the ``people`` database using the + shard key ``{ "state": 1, "_id": 1 }``. - This shard key distributes documents by the value of the ``state`` - field. If a number of documents have the same value for this field, - then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``_id`` - field. + This shard key distributes documents by the value of the ``state`` + field. If a number of documents have the same value for this + field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``_id`` + field. -#. The ``chairs`` collection in the ``assets`` database using the shard key - ``{ "type": 1, "_id": 1 }``. + #. The ``chairs`` collection in the ``assets`` database using the shard key + ``{ "type": 1, "_id": 1 }``. - This shard key distributes documents by the value of the ``type`` - field. If a number of documents have the same value for this field, - then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``_id`` - field. + This shard key distributes documents by the value of the ``type`` + field. If a number of documents have the same value for this + field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``_id`` + field. -#. The ``alerts`` collection in the ``events`` database using the shard key - ``{ "hashed_id": 1 }``. + #. The ``alerts`` collection in the ``events`` database using the shard key + ``{ "hashed_id": 1 }``. - This shard key distributes documents by the value of the - ``hashed_id`` field. Presumably this is a computed value that - holds the hash of some value in your documents and is able to - evenly distribute documents throughout your cluster. + This shard key distributes documents by the value of the + ``hashed_id`` field. Presumably this is a computed value that + holds the hash of some value in your documents and is able to + evenly distribute documents throughout your cluster. diff --git a/source/tutorial/manage-balancer.txt b/source/tutorial/manage-balancer.txt new file mode 100644 index 00000000000..99e0f52bc5e --- /dev/null +++ b/source/tutorial/manage-balancer.txt @@ -0,0 +1,204 @@ +.. index:: balancing; operations +.. _sharding-balancing-operations: + +=================== +Manage the Balancer +=================== + +.. default-domain:: mongodb + +This page describes provides common administrative procedures related +to balancing. For an introduction to balancing, see +:ref:`sharding-balancing`. For lower level information on balancing, see +:ref:`sharding-balancing-internals`. + +This page includes the following: + +- :ref:`sharding-balancing-check-lock` + +- :ref:`sharding-schedule-balancing-window` + +- :ref:`sharding-balancing-remove-window` + +- :ref:`sharding-balancing-disable-temporally` + +.. _sharding-balancing-check-lock: + +Check the Balancer Lock +----------------------- + +To see if the balancer process is active in your :term:`cluster +`, do the following: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Use the following query to return the balancer lock: + + .. code-block:: javascript + + db.locks.find( { _id : "balancer" } ).pretty() + +When this command returns, you will see output like the following: + +.. code-block:: javascript + + { "_id" : "balancer", + "process" : "mongos0.example.net:1292810611:1804289383", + "state" : 2, + "ts" : ObjectId("4d0f872630c42d1978be8a2e"), + "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)", + "who" : "mongos0.example.net:1292810611:1804289383:Balancer:846930886", + "why" : "doing balance round" } + +This output confirms that: + +- The balancer originates from the :program:`mongos` running on the + system with the hostname ``mongos0.example.net``. + +- The value in the ``state`` field indicates that a :program:`mongos` + has the lock. For version 2.0 and later, the value of an active lock + is ``2``; for earlier versions the value is ``1``. + +.. optional:: + + You can also use the following shell helper, which returns a + boolean to report if the balancer is active: + + .. code-block:: javascript + + sh.getBalancerState() + +.. _sharding-schedule-balancing-window: + +Schedule the Balancing Window +----------------------------- + +In some situations, particularly when your data set grows slowly and a +migration can impact performance, it's useful to be able to ensure +that the balancer is active only at certain times. Use the following +procedure to specify a window during which the :term:`balancer` will +be able to migrate chunks: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Use an operation modeled on the following example :method:`update() + ` operation to modify the balancer's + window: + + .. code-block:: javascript + + db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "", stop : "" } } }, true ) + + Replace ```` and ```` with time values using + two digit hour and minute values (e.g ``HH:MM``) that describe the + beginning and end boundaries of the balancing window. + These times will be evaluated relative to the time zone of each individual + :program:`mongos` instance in the sharded cluster. + For instance, running the following + will force the balancer to run between 11PM and 6AM local time only: + + .. code-block:: javascript + + db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:00" } } }, true ) + +.. note:: + + The balancer window must be sufficient to *complete* the migration + of all data inserted during the day. + + As data insert rates can change based on activity and usage + patterns, it is important to ensure that the balancing window you + select will be sufficient to support the needs of your deployment. + +.. _sharding-balancing-remove-window: + +Remove a Balancing Window Schedule +---------------------------------- + +If you have :ref:`set the balancing window +` and wish to remove the schedule +so that the balancer is always running, issue the following sequence +of operations: + +.. code-block:: javascript + + use config + db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true }) + +.. _sharding-balancing-disable-temporally: + +Disable the Balancer +-------------------- + +By default the balancer may run at any time and only moves chunks as +needed. To disable the balancer for a short period of time and prevent +all migration, use the following procedure: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue *one* of the following operations to disable the balancer: + + .. code-block:: javascript + + sh.stopBalancer() + +#. Later, issue *one* the following operations to enable the balancer: + + .. code-block:: javascript + + sh.startBalancer() + +.. note:: + + If a migration is in progress, the system will complete + the in-progress migration. After disabling, you can use the + following operation in the :program:`mongo` shell to determine if + there are no migrations in progress: + + .. code-block:: javascript + + use config + while( db.locks.findOne({_id: "balancer"}).state ) { + print("waiting..."); sleep(1000); + } + +The above process and the :method:`sh.setBalancerState()`, +:method:`sh.startBalancer()`, and :method:`sh.stopBalancer()` helpers provide +wrappers on the following process, which may be useful if you need to +run this operation from a driver that does not have helper functions: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Issue the following update to disable the balancer: + + .. code-block:: javascript + + db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ); + +#. To enable the balancer again, alter the value of "stopped" as follows: + + .. code-block:: javascript + + db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true ); diff --git a/source/tutorial/manage-chunks.txt b/source/tutorial/manage-chunks.txt new file mode 100644 index 00000000000..ad4d775384c --- /dev/null +++ b/source/tutorial/manage-chunks.txt @@ -0,0 +1,383 @@ +.. _sharding-manage-chunks: + +================================== +Manage Chunks in a Sharded Cluster +================================== + +.. default-domain:: mongodb + +This page describes various operations on :term:`chunks ` in +:term:`sharded clusters `. MongoDB automates most +chunk management operations. However, these chunk management +operations are accessible to administrators for use in some +situations, typically surrounding initial setup, deployment, and data +ingestion. + +This page includes the following: + +- :ref:`sharding-administration-create-chunks` + +- :ref:`sharding-balancing-modify-chunk-size` + +- :ref:`sharding-balancing-manual-migration` + +- :ref:`sharding-bulk-inserts` + +- :ref:`sharding-procedure-create-split` + +.. _sharding-procedure-create-split: + +Split Chunks +------------ + +Normally, MongoDB splits a :term:`chunk` following inserts when a +chunk exceeds the :ref:`chunk size `. The +:term:`balancer` may migrate recently split chunks to a new shard +immediately if :program:`mongos` predicts future insertions will +benefit from the move. + +MongoDB treats all chunks the same, whether split manually or +automatically by the system. + +.. warning:: + + You cannot merge or combine chunks once you have split them. + +You may want to split chunks manually if: + +- you have a large amount of data in your cluster and very few + :term:`chunks `, + as is the case after deploying a cluster using existing data. + +- you expect to add a large amount of data that would + initially reside in a single chunk or shard. + +.. example:: + + You plan to insert a large amount of data with :term:`shard key` + values between ``300`` and ``400``, *but* all values of your shard + keys are between ``250`` and ``500`` are in a single chunk. + +Use :method:`sh.status()` to determine the current chunks ranges across +the cluster. + +To split chunks manually, use the :dbcommand:`split` command with +operators: ``middle`` and ``find``. The equivalent shell helpers are +:method:`sh.splitAt()` or :method:`sh.splitFind()`. + +.. example:: + + The following command will split the chunk that contains + the value of ``63109`` for the ``zipcode`` field in the ``people`` + collection of the ``records`` database: + + .. code-block:: javascript + + sh.splitFind( "records.people", { "zipcode": 63109 } ) + +:method:`sh.splitFind()` will split the chunk that contains the +*first* document returned that matches this query into two equally +sized chunks. You must specify the full namespace +(i.e. "``.``") of the sharded collection to +:method:`sh.splitFind()`. The query in :method:`sh.splitFind()` need +not contain the shard key, though it almost always makes sense to +query for the shard key in this case, and including the shard key will +expedite the operation. + +Use :method:`sh.splitAt()` to split a chunk in two using the queried +document as the partition point: + +.. code-block:: javascript + + sh.splitAt( "records.people", { "zipcode": 63109 } ) + +However, the location of the document that this query finds with +respect to the other documents in the chunk does not affect how the +chunk splits. + +.. _sharding-administration-pre-splitting: +.. _sharding-administration-create-chunks: + +Create Chunks (Pre-Splitting) +----------------------------- + +In most situations a :term:`sharded cluster` will create and distribute +chunks automatically without user intervention. However, in a limited +number of use profiles, MongoDB cannot create enough chunks or +distribute data fast enough to support required throughput. Consider +the following scenarios: + +- you must partition an existing data collection that resides on a + single shard. + +- you must ingest a large volume of data into a cluster that + isn't balanced, or where the ingestion of data will lead to an + imbalance of data. + + This can arise in an initial data loading, or in a case where you + must insert a large volume of data into a single chunk, as is the + case when you must insert at the beginning or end of the chunk + range, as is the case for monotonically increasing or decreasing + shard keys. + +Preemptively splitting chunks increases cluster throughput for these +operations, by reducing the overhead of migrating chunks that hold +data during the write operation. MongoDB only creates splits after an +insert operation, and can only migrate a single chunk at a time. Chunk +migrations are resource intensive and further complicated by large +write volume to the migrating chunk. + +.. warning:: + + You can only pre-split an empty collection. When you enable + sharding for a collection that contains data MongoDB automatically + creates splits. Subsequent attempts to create splits manually, can + lead to unpredictable chunk ranges and sizes as well as inefficient + or ineffective balancing behavior. + +To create and migrate chunks manually, use the following procedure: + +#. Split empty chunks in your collection by manually performing + :dbcommand:`split` command on chunks. + + .. example:: + + To create chunks for documents in the ``myapp.users`` + collection, using the ``email`` field as the :term:`shard key`, + use the following operation in the :program:`mongo` shell: + + .. code-block:: javascript + + for ( var x=97; x<97+26; x++ ){ + for( var y=97; y<97+26; y+=6 ) { + var prefix = String.fromCharCode(x) + String.fromCharCode(y); + db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); + } + } + + This assumes a collection size of 100 million documents. + +#. Migrate chunks manually using the :dbcommand:`moveChunk` command: + + .. example:: + + To migrate all of the manually created user profiles evenly, + putting each prefix chunk on the next shard from the other, run + the following commands in the mongo shell: + + .. code-block:: javascript + + var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ]; + for ( var x=97; x<97+26; x++ ){ + for( var y=97; y<97+26; y+=6 ) { + var prefix = String.fromCharCode(x) + String.fromCharCode(y); + db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]}) + } + } + + You can also let the balancer automatically distribute the new + chunks. For an introduction to balancing, see + :ref:`sharding-balancing`. For lower level information on balancing, + see :ref:`sharding-balancing-internals`. + +.. _sharding-balancing-modify-chunk-size: + +Modify Chunk Size +----------------- + +When you initialize a sharded cluster, the default chunk size is 64 +megabytes. This default chunk size works well for most deployments. However, if you +notice that automatic migrations are incurring a level of I/O that +your hardware cannot handle, you may want to reduce the chunk +size. For the automatic splits and migrations, a small chunk size +leads to more rapid and frequent migrations. + +To modify the chunk size, use the following procedure: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Issue the following :method:`save() ` + operation: + + .. code-block:: javascript + + db.settings.save( { _id:"chunksize", value: } ) + + Where the value of ```` reflects the new chunk size in + megabytes. Here, you're essentially writing a document whose values + store the global chunk size configuration value. + +.. note:: + + The :setting:`chunkSize` and :option:`--chunkSize ` + options, passed at runtime to the :program:`mongos` **do not** + affect the chunk size after you have initialized the cluster. + + To eliminate confusion you should *always* set chunk size using the + above procedure and never use the runtime options. + +Modifying the chunk size has several limitations: + +- Automatic splitting only occurs when inserting :term:`documents + ` or updating existing documents. + +- If you lower the chunk size it may take time for all chunks to split to + the new size. + +- Splits cannot be "undone." + +If you increase the chunk size, existing chunks must grow through +insertion or updates until they reach the new size. + +.. _sharding-balancing-manual-migration: + +Migrate Chunks +-------------- + +In most circumstances, you should let the automatic balancer +migrate :term:`chunks ` between :term:`shards `. +However, you may want to migrate chunks manually in a few cases: + +- If you create chunks by :term:`pre-splitting` the data in your + collection, you will have to migrate chunks manually to distribute + chunks evenly across the shards. Use pre-splitting in limited + situations, to support bulk data ingestion. + +- If the balancer in an active cluster cannot distribute chunks within + the balancing window, then you will have to migrate chunks manually. + +For more information on how chunks move between shards, see +:ref:`sharding-balancing-internals`, in particular the section +:ref:`sharding-chunk-migration`. + +To migrate chunks, use the :dbcommand:`moveChunk` command. + +.. note:: + + To return a list of shards, use the :dbcommand:`listShards` + command. + + Specify shard names using the :dbcommand:`addShard` command + using the ``name`` argument. If you do not specify a name in the + :dbcommand:`addShard` command, MongoDB will assign a name + automatically. + +The following example assumes that the field ``username`` is the +:term:`shard key` for a collection named ``users`` in the ``myapp`` +database, and that the value ``smith`` exists within the :term:`chunk` +you want to migrate. + +To move this chunk, you would issue the following command from a :program:`mongo` +shell connected to any :program:`mongos` instance. + +.. code-block:: javascript + + db.adminCommand({moveChunk : "myapp.users", find : {username : "smith"}, to : "mongodb-shard3.example.net"}) + +This command moves the chunk that includes the shard key value "smith" to the +:term:`shard` named ``mongodb-shard3.example.net``. The command will +block until the migration is complete. + +See :ref:`sharding-administration-create-chunks` for an introduction +to pre-splitting. + +.. versionadded:: 2.2 + :dbcommand:`moveChunk` command has the: ``_secondaryThrottle`` + parameter. When set to ``true``, MongoDB ensures that + :term:`secondary` members have replicated operations before allowing + new chunk migrations. + +.. warning:: + + The :dbcommand:`moveChunk` command may produce the following error + message: + + .. code-block:: none + + The collection's metadata lock is already taken. + + These errors occur when clients have too many open :term:`cursors + ` that access the chunk you are migrating. You can either + wait until the cursors complete their operation or close the + cursors manually. + + .. todo:: insert link to killing a cursor. + +.. index:: bulk insert +.. _sharding-bulk-inserts: + +Strategies for Bulk Inserts in Sharded Clusters +----------------------------------------------- + +.. todo:: Consider moving to the administrative guide as it's of an + applied nature, or create an applications document for sharding + +.. todo:: link the words "bulk insert" to the bulk insert topic when + it's published + + +Large bulk insert operations including initial data ingestion or +routine data import, can have a significant impact on a :term:`sharded +cluster`. Consider the following strategies and possibilities for +bulk insert operations: + +- If the collection does not have data, then there is only one + :term:`chunk`, which must reside on a single shard. MongoDB must + receive data, create splits, and distribute chunks to the available + shards. To avoid this performance cost, you can pre-split the + collection, as described in :ref:`sharding-administration-pre-splitting`. + +- You can parallelize import processes by sending insert operations to + more than one :program:`mongos` instance. If the collection is + empty, pre-split first, as described in + :ref:`sharding-administration-pre-splitting`. + +- If your shard key increases monotonically during an insert then all + the inserts will go to the last chunk in the collection, which will + always end up on a single shard. Therefore, the insert capacity of the + cluster will never exceed the insert capacity of a single shard. + + If your insert volume is never larger than what a single shard can + process, then there is no problem; however, if the insert volume + exceeds that range, and you cannot avoid a monotonically + increasing shard key, then consider the following modifications to + your application: + + - Reverse all the bits of the shard key to preserve the information + while avoiding the correlation of insertion order and increasing + sequence of values. + + - Swap the first and last 16-bit words to "shuffle" the inserts. + + .. example:: The following example, in C++, swaps the leading and + trailing 16-bit word of :term:`BSON` :term:`ObjectIds ` + generated so that they are no longer monotonically increasing. + + .. code-block:: cpp + + using namespace mongo; + OID make_an_id() { + OID x = OID::gen(); + const unsigned char *p = x.getData(); + swap( (unsigned short&) p[0], (unsigned short&) p[10] ); + return x; + } + + void foo() { + // create an object + BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); + // now we might insert o into a sharded collection... + } + + For information on choosing a shard key, see :ref:`sharding-shard-key` + and see :ref:`Shard Key Internals ` (in + particular, :ref:`sharding-internals-operations-and-reliability` and + :ref:`sharding-internals-choose-shard-key`). + diff --git a/source/tutorial/remove-shards-from-cluster.txt b/source/tutorial/remove-shards-from-cluster.txt index 72098d758d5..05a15614096 100644 --- a/source/tutorial/remove-shards-from-cluster.txt +++ b/source/tutorial/remove-shards-from-cluster.txt @@ -4,56 +4,67 @@ Remove Shards from an Existing Sharded Cluster .. default-domain:: mongodb -Synopsis --------- +To remove a :term:`shard` you must ensure the shard's data is migrated +to the remaining shards in the cluster. This procedure describes how to +safely migrate data and how to remove a shard. -This procedure describes the procedure for migrating data from a -shard safely, when you need to decommission a shard. You may also need -to remove shards as part of hardware reorganization and data -migration. - -*Do not* use this procedure to migrate an entire cluster to new -hardware. To migrate an entire shard to new hardware, migrate -individual shards as if they were independent replica sets. +This procedure describes how to safely remove a *single* shard. *Do not* +use this procedure to migrate an entire cluster to new hardware. To +migrate an entire shard to new hardware, migrate individual shards as if +they were independent replica sets. .. DOCS-94 will lead to a tutorial about cluster migrations. In the mean time the above section will necessarily lack links. -To remove a shard, you will: +To remove a shard, first connect to one of the cluster's +:program:`mongos` instances using :program:`mongo` shell. Then follow +the ordered sequence of tasks on this page: + +1. :ref:`remove-shard-ensure-balancer-is-active` + +#. :ref:`remove-shard-determine-name-shard` + +#. :ref:`remove-shard-remove-chunks` + +#. :ref:`remove-shard-check-migration-status` + +#. :ref:`remove-shard-move-unsharded-databases` -- Move :term:`chunks ` off of the shard. +#. :ref:`remove-shard-finalize-migration` -- Ensure that this shard is not the :term:`primary shard` for any databases - in the cluster. If it is, move the "primary" status for these - databases to other shards. +.. _remove-shard-ensure-balancer-is-active: -- Remove the shard from the cluster's configuration. +Ensure the Balancer Process is Active +------------------------------------- -Procedure ---------- +To successfully migrate data from a shard, the :term:`balancer` process +**must** be active. Check the balancer state using the +:method:`sh.getBalancerState()` helper in the :program:`mongo` shell. +For more information, see the section on :ref:`balancer operations +`. -Complete this procedure by connecting to any :program:`mongos` in the -cluster using the :program:`mongo` shell. +.. _remove-shard-determine-name-shard: -You can only remove a shard by its shard name. To discover or -confirm the name of a shard, use the :dbcommand:`listShards` command, -:dbcommand:`printShardingStatus` command, or :method:`sh.status()` shell helper. +Determine the Name of the Shard to Remove +----------------------------------------- -The example commands in this document remove a shard named ``mongodb0``. +To determine the name of the shard, do one of the following: -.. note:: +- From the ``admin`` database, run the :dbcommand:`listShards` command. - To successfully migrate data from a shard, the :term:`balancer` - process **must** be active. Check the balancer state using the - :method:`sh.getBalancerState()` helper in the :program:`mongo` - shell. For more information, see the section on :ref:`balancer operations - `. +- Run either the :method:`sh.status()` method or the + :method:`sh.printShardingStatus()` method. + +The ``shards._id`` field lists the name of each shard. + +.. _remove-shard-remove-chunks: Remove Chunks from the Shard -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +---------------------------- -Start by running the :dbcommand:`removeShard` command. This -begins "draining" chunks from the shard you are removing. +Run the :dbcommand:`removeShard` command. This begins "draining" chunks +from the shard you are removing to other shards in the cluster. For +example, for a shard named ``mongodb0``, run: .. code-block:: javascript @@ -65,69 +76,94 @@ This operation returns immediately, with the following response: { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } -Depending on your network capacity and the amount of data in your -cluster, this operation can take from a few minutes to -several days to complete. +Depending on your network capacity and the amount of data, this +operation can take from a few minutes to several days to complete. + +.. _remove-shard-check-migration-status: Check the Status of the Migration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------------------------- -To check the progress of the migration, -run :dbcommand:`removeShard` again at any stage of the -process, as follows: +To check the progress of the migration at any stage in the process, run +:dbcommand:`removeShard`. For example, for a shard named ``mongodb0``, run: .. code-block:: javascript db.runCommand( { removeShard: "mongodb0" } ) -The output resembles the following document: +The command returns output similar to the following: .. code-block:: javascript { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } -In the ``remaining`` sub document, a counter displays the remaining number +In the output, the ``remaining`` document displays the remaining number of chunks that MongoDB must migrate to other shards and the number of MongoDB databases that have "primary" status on this shard. Continue checking the status of the `removeShard` command until the -number of chunks remaining is 0. Then proceed to the next step. +number of chunks remaining is ``0``. Then proceed to the next step. -Move Unsharded Databases -~~~~~~~~~~~~~~~~~~~~~~~~ +.. _remove-shard-move-unsharded-databases: -Databases with non-sharded collections store those collections on a -single shard known as the :term:`primary shard` for that database. The -following step is necessary only when the shard to remove is -also the primary shard for one or more databases. +Move Unsharded Data +------------------- -Issue the following command at the :program:`mongo` shell: +If the shard is the :term:`primary shard` for one or more databases in +the cluster, then the shard will have unsharded data. If the shard is +not the primary shard for any databases, skip to the next task, +:ref:`remove-shard-finalize-migration`. -.. code-block:: javascript +In a cluster, a database with unsharded collections stores those +collections only on a single shard. That shard becomes the primary shard +for that database. (Different databases in a cluster can have different +primary shards.) - db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) +.. warning:: -This command migrates all remaining non-sharded data in the -database named ``myapp`` to the shard named ``mongodb1``. + Do not perform this procedure until you have finished draining the + shard. -.. warning:: +1. To determine if the shard you are removing is the primary shard for + any of the cluster's databases, issue one of the following methods: - Do not run the :dbcommand:`movePrimary` until you have *finished* - draining the shard. + - :method:`sh.status()` -This command will not return until MongoDB completes moving all data, -which may take a long time. The response from this command will -resemble the following: + - :method:`sh.printShardingStatus()` -.. code-block:: javascript + In the resulting document, the ``databases`` field lists each + database and its primary shard. For example, the following + ``database`` field shows that the ``products`` database uses + ``mongodb0`` as the primary shard: + + .. code-block:: javascript + + { "_id" : "products", "partitioned" : true, "primary" : "mongodb0" } + +#. To move a database to another shard, use the :dbcommand:`movePrimary` + command. For example, to migrate all remaining unsharded data from + ``mongodb0`` to ``mongodb1``, issue the following command: + + .. code-block:: javascript + + db.runCommand( { movePrimary: "products", to: "mongodb1" }) + + This command does not return until MongoDB completes moving all data, + which may take a long time. The response from this command will + resemble the following: + + .. code-block:: javascript + + { "primary" : "mongodb1", "ok" : 1 } - { "primary" : "mongodb1", "ok" : 1 } +.. _remove-shard-finalize-migration: Finalize the Migration -~~~~~~~~~~~~~~~~~~~~~~ +---------------------- -Run :dbcommand:`removeShard` again to clean up all metadata -information and finalize the removal, as follows: +To clean up all metadata information and finalize the removal, run +:dbcommand:`removeShard` again. For example, for a shard named +``mongodb0``, run: .. code-block:: javascript @@ -139,5 +175,5 @@ A success message appears at completion: { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } -When the value of "state" is "completed", you may safely stop the -``mongodb0`` shard. +Once the value of the ``stage`` field is "completed", you may safely +stop the processes comprising the ``mongodb0`` shard. diff --git a/source/tutorial/view-cluster-configuration.txt b/source/tutorial/view-cluster-configuration.txt new file mode 100644 index 00000000000..81540462a43 --- /dev/null +++ b/source/tutorial/view-cluster-configuration.txt @@ -0,0 +1,108 @@ +.. _sharding-manage-shards: + +========================== +View Cluster Configuration +========================== + +.. default-domain:: mongodb + +.. _sharding-procedure-list-databases: + +List Databases with Sharding Enabled +------------------------------------ + +To list the databases that have sharding enabled, query the +``databases`` collection in the :ref:`config-database`. +A database has sharding enabled if the value of the ``partitioned`` +field is ``true``. Connect to a :program:`mongos` instance with a +:program:`mongo` shell, and run the following operation to get a full +list of databases with sharding enabled: + +.. code-block:: javascript + + use config + db.databases.find( { "partitioned": true } ) + +.. example:: You can use the following sequence of commands when to + return a list of all databases in the cluster: + + .. code-block:: javascript + + use config + db.databases.find() + + If this returns the following result set: + + .. code-block:: javascript + + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" } + { "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" } + + Then sharding is only enabled for the ``animals`` database. + +.. _sharding-procedure-list-shards: + +List Shards +----------- + +To list the current set of configured shards, use the :dbcommand:`listShards` +command, as follows: + +.. code-block:: javascript + + use admin + db.runCommand( { listShards : 1 } ) + +.. _sharding-procedure-view-clusters: + +View Cluster Details +-------------------- + +To view cluster details, issue :method:`db.printShardingStatus()` or +:method:`sh.status()`. Both methods return the same output. + +.. example:: In the following example output from :method:`sh.status()` + + - ``sharding version`` displays the version number of the shard + metadata. + + - ``shards`` displays a list of the :program:`mongod` instances + used as shards in the cluster. + + - ``databases`` displays all databases in the cluster, + including database that do not have sharding enabled. + + - The ``chunks`` information for the ``foo`` database displays how + many chunks are on each shard and displays the range of each chunk. + + .. code-block:: javascript + + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "shard0000", "host" : "m0.example.net:30001" } + { "_id" : "shard0001", "host" : "m3.example2.net:50000" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "animals", "partitioned" : true, "primary" : "shard0000" } + foo.big chunks: + shard0001 1 + shard0000 6 + { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo + { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo + { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo + { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) + { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + foo.large chunks: + shard0001 1 + shard0000 5 + { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) + { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo + { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) + { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + { "_id" : "test", "partitioned" : false, "primary" : "shard0000" }