@@ -9,38 +9,19 @@ Shard Keys
9
9
10
10
outline the ways that insert operations work given shard keys of the following types
11
11
12
- Monotonically Increasing Values
13
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14
-
15
- .. what will happen if you try to do inserts.
16
-
17
- Documents with monotonically increasing shard keys, such as the BSON ObjectID, will always
18
- be inserted into the last chunk in a collection. To illustrate why, consider a sharded
19
- collection with two chunks, the second of which has an unbounded upper limit.
20
12
21
- [-∞, 100)
22
- [100,+∞)
23
-
24
- If the data being inserted has an increasing key, at any given time writes will always hit
25
- the shard containing the chunk with the unbounded upper limit, a problem that is not
26
- alleviated by splitting the "hot" chunk. High volume inserts, therefore, could hinder the
27
- cluster's performance by placing a significant load on a single shard.
28
-
29
- If, however, a single shard can handle the write volume, an increasing shard key may have
30
- some advantages. For example, if you need to do queries based on document insertion time,
31
- sharding on the ObjectID ensures that documents created around the same time exist on the
32
- same shard. Data locality helps to improve query performance.
33
-
34
- If you decide to use an monotonically increasing shard key and anticipate large inserts,
35
- one solution may be to store the hash of the shard key as a separate field. Hashing may
36
- prevent the need to balance chunks by distributing data equally around the cluster. You can
37
- create a hash client-side. In the future, MongoDB may support automatic hashing:
38
- https://jira.mongodb.org/browse/SERVER-2001
39
13
40
14
41
15
Even Distribution
42
16
~~~~~~~~~~~~~~~~~
43
17
18
+ If the data's distribution of keys is evenly, MongoDB should be able to distribute writes
19
+ evenly around a the cluster once the chunk key ranges are established. MongoDB will
20
+ automatically split chunks when they grow to a certain size (~64 MB by default) and will
21
+ balance the number of chunks across shards.
22
+
23
+ When inserting data into a new collection, it may be important to pre-split the key ranges.
24
+ See the section below on pre-splitting and manually moving chunks.
44
25
45
26
Uneven Distribution
46
27
~~~~~~~~~~~~~~~~~~~
@@ -64,6 +45,34 @@ more granular in this portion of the alphabet may improve write performance.
64
45
["Smith", "Tyler"]
65
46
["Tyler",+∞)
66
47
48
+ Monotonically Increasing Values
49
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50
+
51
+ .. what will happen if you try to do inserts.
52
+
53
+ Documents with monotonically increasing shard keys, such as the BSON ObjectID, will always
54
+ be inserted into the last chunk in a collection. To illustrate why, consider a sharded
55
+ collection with two chunks, the second of which has an unbounded upper limit.
56
+
57
+ [-∞, 100)
58
+ [100,+∞)
59
+
60
+ If the data being inserted has an increasing key, at any given time writes will always hit
61
+ the shard containing the chunk with the unbounded upper limit, a problem that is not
62
+ alleviated by splitting the "hot" chunk. High volume inserts, therefore, could hinder the
63
+ cluster's performance by placing a significant load on a single shard.
64
+
65
+ If, however, a single shard can handle the write volume, an increasing shard key may have
66
+ some advantages. For example, if you need to do queries based on document insertion time,
67
+ sharding on the ObjectID ensures that documents created around the same time exist on the
68
+ same shard. Data locality helps to improve query performance.
69
+
70
+ If you decide to use an monotonically increasing shard key and anticipate large inserts,
71
+ one solution may be to store the hash of the shard key as a separate field. Hashing may
72
+ prevent the need to balance chunks by distributing data equally around the cluster. You can
73
+ create a hash client-side. In the future, MongoDB may support automatic hashing:
74
+ https://jira.mongodb.org/browse/SERVER-2001
75
+
67
76
Operations
68
77
----------
69
78
@@ -88,6 +97,20 @@ In the example below the pre-split command splits the chunk where the _id 99 wou
88
97
using that key as the split point. Note that a key need not exist for a chunk to use it in
89
98
its range. The chunk may even be empty.
90
99
100
+ The first step is to create a sharded collection to contain the data, which can be done in
101
+ three steps:
102
+
103
+ > use admin
104
+ > db.runCommand({ enableSharding : "foo" })
105
+
106
+ Next, we add a unique index to the collection "foo.bar" which is required for the shard
107
+ key.
108
+
109
+ > use foo
110
+ > db.bar.ensureIndex({ _id : 1 }, { unique : true })
111
+
112
+ Finally we shard the collection (which contains no data) using the _id value.
113
+
91
114
> use admin
92
115
switched to db admin
93
116
> db.runCommand( { split : "test.foo" , middle : { _id : 99 } } )
0 commit comments