Skip to content

Commit 6e380bf

Browse files
committed
edited tutorial on inserting docs into sharded collection
1 parent d1cc2bf commit 6e380bf

File tree

1 file changed

+49
-26
lines changed

1 file changed

+49
-26
lines changed

source/tutorial/inserting-documents-into-a-sharded-collection.txt

Lines changed: 49 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -9,38 +9,19 @@ Shard Keys
99

1010
outline the ways that insert operations work given shard keys of the following types
1111

12-
Monotonically Increasing Values
13-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14-
15-
.. what will happen if you try to do inserts.
16-
17-
Documents with monotonically increasing shard keys, such as the BSON ObjectID, will always
18-
be inserted into the last chunk in a collection. To illustrate why, consider a sharded
19-
collection with two chunks, the second of which has an unbounded upper limit.
2012

21-
[-∞, 100)
22-
[100,+∞)
23-
24-
If the data being inserted has an increasing key, at any given time writes will always hit
25-
the shard containing the chunk with the unbounded upper limit, a problem that is not
26-
alleviated by splitting the "hot" chunk. High volume inserts, therefore, could hinder the
27-
cluster's performance by placing a significant load on a single shard.
28-
29-
If, however, a single shard can handle the write volume, an increasing shard key may have
30-
some advantages. For example, if you need to do queries based on document insertion time,
31-
sharding on the ObjectID ensures that documents created around the same time exist on the
32-
same shard. Data locality helps to improve query performance.
33-
34-
If you decide to use an monotonically increasing shard key and anticipate large inserts,
35-
one solution may be to store the hash of the shard key as a separate field. Hashing may
36-
prevent the need to balance chunks by distributing data equally around the cluster. You can
37-
create a hash client-side. In the future, MongoDB may support automatic hashing:
38-
https://jira.mongodb.org/browse/SERVER-2001
3913

4014

4115
Even Distribution
4216
~~~~~~~~~~~~~~~~~
4317

18+
If the data's distribution of keys is evenly, MongoDB should be able to distribute writes
19+
evenly around a the cluster once the chunk key ranges are established. MongoDB will
20+
automatically split chunks when they grow to a certain size (~64 MB by default) and will
21+
balance the number of chunks across shards.
22+
23+
When inserting data into a new collection, it may be important to pre-split the key ranges.
24+
See the section below on pre-splitting and manually moving chunks.
4425

4526
Uneven Distribution
4627
~~~~~~~~~~~~~~~~~~~
@@ -64,6 +45,34 @@ more granular in this portion of the alphabet may improve write performance.
6445
["Smith", "Tyler"]
6546
["Tyler",+∞)
6647

48+
Monotonically Increasing Values
49+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50+
51+
.. what will happen if you try to do inserts.
52+
53+
Documents with monotonically increasing shard keys, such as the BSON ObjectID, will always
54+
be inserted into the last chunk in a collection. To illustrate why, consider a sharded
55+
collection with two chunks, the second of which has an unbounded upper limit.
56+
57+
[-∞, 100)
58+
[100,+∞)
59+
60+
If the data being inserted has an increasing key, at any given time writes will always hit
61+
the shard containing the chunk with the unbounded upper limit, a problem that is not
62+
alleviated by splitting the "hot" chunk. High volume inserts, therefore, could hinder the
63+
cluster's performance by placing a significant load on a single shard.
64+
65+
If, however, a single shard can handle the write volume, an increasing shard key may have
66+
some advantages. For example, if you need to do queries based on document insertion time,
67+
sharding on the ObjectID ensures that documents created around the same time exist on the
68+
same shard. Data locality helps to improve query performance.
69+
70+
If you decide to use an monotonically increasing shard key and anticipate large inserts,
71+
one solution may be to store the hash of the shard key as a separate field. Hashing may
72+
prevent the need to balance chunks by distributing data equally around the cluster. You can
73+
create a hash client-side. In the future, MongoDB may support automatic hashing:
74+
https://jira.mongodb.org/browse/SERVER-2001
75+
6776
Operations
6877
----------
6978

@@ -88,6 +97,20 @@ In the example below the pre-split command splits the chunk where the _id 99 wou
8897
using that key as the split point. Note that a key need not exist for a chunk to use it in
8998
its range. The chunk may even be empty.
9099

100+
The first step is to create a sharded collection to contain the data, which can be done in
101+
three steps:
102+
103+
> use admin
104+
> db.runCommand({ enableSharding : "foo" })
105+
106+
Next, we add a unique index to the collection "foo.bar" which is required for the shard
107+
key.
108+
109+
> use foo
110+
> db.bar.ensureIndex({ _id : 1 }, { unique : true })
111+
112+
Finally we shard the collection (which contains no data) using the _id value.
113+
91114
> use admin
92115
switched to db admin
93116
> db.runCommand( { split : "test.foo" , middle : { _id : 99 } } )

0 commit comments

Comments
 (0)