|
1 | | -What are the best ways to successfully insert larger volumes of data into as sharded collection? |
2 | | ------------------------------------------------------------------------------------------------- |
| 1 | +For high-volume inserts, when is it necessary to first pre-split data? |
| 2 | +---------------------------------------------------------------------- |
3 | 3 |
|
4 | | -- what is pre-splitting |
| 4 | +Whether to pre-split before a high-volume insert depends on the |
| 5 | +:term:`shard key`, the existing distribution of :term:`chunks <chunk>`, |
| 6 | +and how evenly distributed the insert operation is. |
5 | 7 |
|
6 | | - In sharded environments, MongoDB distributes data into :term:`chunks |
7 | | - <chunk>`, each defined by a range of shard key values. Pre-splitting is a command run |
8 | | - prior to data insertion that specifies the shard key values on which to split up chunks. |
| 8 | +In the following cases, we recommend pre-splitting before a large insert: |
9 | 9 |
|
10 | | -- Pre-splitting is useful before large inserts into a sharded collection when: |
| 10 | +- Inserting data into an empty collection |
11 | 11 |
|
12 | | -1. inserting data into an empty collection |
| 12 | + If a collection is empty, the database takes time to determine the |
| 13 | + optimal key distribution. If you insert many documents in rapid |
| 14 | + succession, MongoDB initially directs writes to a single chunk, which |
| 15 | + can affect performance. Predefining splits improves write performance |
| 16 | + in the early stages of a bulk import by eliminating the database's |
| 17 | + "learning" period. |
13 | 18 |
|
14 | | -If a collection is empty, the database takes time to determine the optimal key |
15 | | -distribution. If you insert many documents in rapid succession, MongoDB will initially |
16 | | -direct writes to a single chunk, potentially having significant impacts on performance. |
17 | | -Predefining splits may improve write performance in the early stages of a bulk import by |
18 | | -eliminating the database's "learning" period. |
| 19 | +- Data is not evenly distributed |
19 | 20 |
|
20 | | -2. data is not evenly distributed |
| 21 | + Even if the sharded collection contains existing documents balanced |
| 22 | + over multiple chunks, :term:`pre-splitting` is beneficial if the write |
| 23 | + operation itself isn't evenly distributed, i.e., if the inserts |
| 24 | + include shard-key values that are contained on only a small number of |
| 25 | + chunks. By pre-splitting and using an increasing shard key, you can |
| 26 | + prevent writes from monopolizing a single :term:`shard`. |
21 | 27 |
|
22 | | -Even if the sharded collection was previously populated with documents and contains multiple |
23 | | -chunks, pre-splitting may be beneficial if the write operation isn't evenly distributed, in |
24 | | -other words, if the inserts have shard keys values contained on a small number of chunks. |
| 28 | +- Monotomically increasing shard key. |
25 | 29 |
|
26 | | -3. monotomically increasing shard key |
| 30 | + If you attempt to insert data with monotonically increasing shard |
| 31 | + keys, the writes will always occur on the last chunk in the |
| 32 | + collection. Predefining splits helps to cycle a large write operation |
| 33 | + around the cluster; however, pre-splitting in this instance will not |
| 34 | + prevent consecutive inserts from hitting a single shard. |
27 | 35 |
|
28 | | -If you attempt to insert data with monotonically increasing shard keys, the writes will |
29 | | -always hit the last chunk in the collection. Predefining splits may help to cycle a large |
30 | | -write operation around the cluster; however, pre-splitting in this instance will not |
31 | | -prevent consecutive inserts from hitting a single shard. |
| 36 | +Pre-splitting might *not* be necessary in the following cases: |
32 | 37 |
|
33 | | -- when does it not matter |
| 38 | +- If data insertion is not rapid, MongoDB may have enough time to split |
| 39 | + and migrate chunks without affecting performance. |
34 | 40 |
|
35 | | -If data insertion is not rapid, MongoDB may have enough time to split and migrate chunks without |
36 | | -impacts on performance. In addition, if the collection already has chunks with an even key |
37 | | -distribution, pre-splitting may not be necessary. |
| 41 | +- If the collection already has chunks with an even key distribution, |
| 42 | + pre-splitting may not be necessary. |
38 | 43 |
|
39 | | -See the ":doc:`/tutorial/inserting-documents-into-a-sharded-collection`" tutorial for more |
40 | | -information. |
| 44 | +For more information, see :doc:`/tutorial/inserting-documents-into-a-sharded-collection`. |
41 | 45 |
|
42 | | - |
43 | | -Is it necessary to pre-split data before high volume inserts into a sharded collection? |
44 | | ---------------------------------------------------------------------------------------- |
45 | | - |
46 | | -The answer depends on the shard key, the existing distribution of chunks, and how |
47 | | -evenly distributed the insert operation is. If a collection is empty prior to a |
48 | | -bulk insert, the database will take time to determine the optimal key |
49 | | -distribution. Predefining splits improves write performance in the early stages |
50 | | -of a bulk import. |
51 | | - |
52 | | -Pre-splitting is also important if the write operation isn't evenly distributed. |
53 | | -When using an increasing shard key, for example, pre-splitting data can prevent |
54 | | -writes from hammering a single shard. |
| 46 | +.. SK, I flipped the above sentence, which could instead read: |
| 47 | +.. See :doc:`/tutorial/inserting-documents-into-a-sharded-collection` for more information. |
| 48 | +.. I prefer the former, but I think you prefer the latter. Let me know. -BG |
0 commit comments