edited tutorial on inserting docs into sharded collection

jdeboi · jdeboi · commit 6e380bf9ae5d · 2012-07-17T14:03:07.000-04:00
diff --git a/source/tutorial/inserting-documents-into-a-sharded-collection.txt b/source/tutorial/inserting-documents-into-a-sharded-collection.txt
@@ -9,38 +9,19 @@ Shard Keys
 
    outline the ways that insert operations work given shard keys of the following types
 
-Monotonically Increasing Values
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. what will happen if you try to do inserts.
-
-Documents with monotonically increasing shard keys, such as the BSON ObjectID, will always
-be inserted into the last chunk in a collection. To illustrate why, consider a sharded
-collection with two chunks, the second of which has an unbounded upper limit. 
 
-[-∞, 100) 
-[100,+∞)
-
-If the data being inserted has an increasing key, at any given time writes will always hit
-the shard containing the chunk with the unbounded upper limit, a problem that is not
-alleviated by splitting the "hot" chunk. High volume inserts, therefore, could hinder the
-cluster's performance by placing a significant load on a single shard.
-
-If, however, a single shard can handle the write volume, an increasing shard key may have
-some advantages. For example, if you need to do queries based on document insertion time,
-sharding on the ObjectID ensures that documents created around the same time exist on the
-same shard. Data locality helps to improve query performance.
-
-If you decide to use an monotonically increasing shard key and anticipate large inserts,
-one solution may be to store the hash of the shard key as a separate field. Hashing may
-prevent the need to balance chunks by distributing data equally around the cluster. You can
-create a hash client-side. In the future, MongoDB may support automatic hashing:
-https://jira.mongodb.org/browse/SERVER-2001
 
 
 Even Distribution
 ~~~~~~~~~~~~~~~~~
 
+If the data's distribution of keys is evenly, MongoDB should be able to distribute writes
+evenly around a the cluster once the chunk key ranges are established. MongoDB will
+automatically split chunks when they grow to a certain size (~64 MB by default) and will
+balance the number of chunks across shards.
+
+When inserting data into a new collection, it may be important to pre-split the key ranges.
+See the section below on pre-splitting and manually moving chunks.
 
 Uneven Distribution
 ~~~~~~~~~~~~~~~~~~~
@@ -64,6 +45,34 @@ more granular in this portion of the alphabet may improve write performance.
 ["Smith", "Tyler"]
 ["Tyler",+∞)
 
+Monotonically Increasing Values
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. what will happen if you try to do inserts.
+
+Documents with monotonically increasing shard keys, such as the BSON ObjectID, will always
+be inserted into the last chunk in a collection. To illustrate why, consider a sharded
+collection with two chunks, the second of which has an unbounded upper limit. 
+
+[-∞, 100) 
+[100,+∞)
+
+If the data being inserted has an increasing key, at any given time writes will always hit
+the shard containing the chunk with the unbounded upper limit, a problem that is not
+alleviated by splitting the "hot" chunk. High volume inserts, therefore, could hinder the
+cluster's performance by placing a significant load on a single shard.
+
+If, however, a single shard can handle the write volume, an increasing shard key may have
+some advantages. For example, if you need to do queries based on document insertion time,
+sharding on the ObjectID ensures that documents created around the same time exist on the
+same shard. Data locality helps to improve query performance.
+
+If you decide to use an monotonically increasing shard key and anticipate large inserts,
+one solution may be to store the hash of the shard key as a separate field. Hashing may
+prevent the need to balance chunks by distributing data equally around the cluster. You can
+create a hash client-side. In the future, MongoDB may support automatic hashing:
+https://jira.mongodb.org/browse/SERVER-2001
+
 Operations
 ----------
 
@@ -88,6 +97,20 @@ In the example below the pre-split command splits the chunk where the _id 99 wou
 using that key as the split point. Note that a key need not exist for a chunk to use it in
 its range. The chunk may even be empty.
 
+The first step is to create a sharded collection to contain the data, which can be done in
+three steps:
+
+> use admin
+> db.runCommand({ enableSharding : "foo" })
+
+Next, we add a unique index to the collection "foo.bar" which is required for the shard
+key.
+
+> use foo
+> db.bar.ensureIndex({ _id : 1 }, { unique : true })
+
+Finally we shard the collection (which contains no data) using the _id value.
+
 > use admin
 switched to db admin
 > db.runCommand( { split : "test.foo" , middle : { _id : 99 } } )