Skip to content

Commit b618f0d

Browse files
author
Sam Kleinman
committed
merge: DOCS-458
2 parents f27b8f2 + 183d102 commit b618f0d

File tree

2 files changed

+236
-142
lines changed

2 files changed

+236
-142
lines changed

draft/faq-sharding-addition.txt

Lines changed: 35 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,48 @@
1-
What are the best ways to successfully insert larger volumes of data into as sharded collection?
2-
------------------------------------------------------------------------------------------------
1+
For high-volume inserts, when is it necessary to first pre-split data?
2+
----------------------------------------------------------------------
33

4-
- what is pre-splitting
4+
Whether to pre-split before a high-volume insert depends on the
5+
:term:`shard key`, the existing distribution of :term:`chunks <chunk>`,
6+
and how evenly distributed the insert operation is.
57

6-
In sharded environments, MongoDB distributes data into :term:`chunks
7-
<chunk>`, each defined by a range of shard key values. Pre-splitting is a command run
8-
prior to data insertion that specifies the shard key values on which to split up chunks.
8+
In the following cases, we recommend pre-splitting before a large insert:
99

10-
- Pre-splitting is useful before large inserts into a sharded collection when:
10+
- Inserting data into an empty collection
1111

12-
1. inserting data into an empty collection
12+
If a collection is empty, the database takes time to determine the
13+
optimal key distribution. If you insert many documents in rapid
14+
succession, MongoDB initially directs writes to a single chunk, which
15+
can affect performance. Predefining splits improves write performance
16+
in the early stages of a bulk import by eliminating the database's
17+
"learning" period.
1318

14-
If a collection is empty, the database takes time to determine the optimal key
15-
distribution. If you insert many documents in rapid succession, MongoDB will initially
16-
direct writes to a single chunk, potentially having significant impacts on performance.
17-
Predefining splits may improve write performance in the early stages of a bulk import by
18-
eliminating the database's "learning" period.
19+
- Data is not evenly distributed
1920

20-
2. data is not evenly distributed
21+
Even if the sharded collection contains existing documents balanced
22+
over multiple chunks, :term:`pre-splitting` is beneficial if the write
23+
operation itself isn't evenly distributed, i.e., if the inserts
24+
include shard-key values that are contained on only a small number of
25+
chunks. By pre-splitting and using an increasing shard key, you can
26+
prevent writes from monopolizing a single :term:`shard`.
2127

22-
Even if the sharded collection was previously populated with documents and contains multiple
23-
chunks, pre-splitting may be beneficial if the write operation isn't evenly distributed, in
24-
other words, if the inserts have shard keys values contained on a small number of chunks.
28+
- Monotomically increasing shard key.
2529

26-
3. monotomically increasing shard key
30+
If you attempt to insert data with monotonically increasing shard
31+
keys, the writes will always occur on the last chunk in the
32+
collection. Predefining splits helps to cycle a large write operation
33+
around the cluster; however, pre-splitting in this instance will not
34+
prevent consecutive inserts from hitting a single shard.
2735

28-
If you attempt to insert data with monotonically increasing shard keys, the writes will
29-
always hit the last chunk in the collection. Predefining splits may help to cycle a large
30-
write operation around the cluster; however, pre-splitting in this instance will not
31-
prevent consecutive inserts from hitting a single shard.
36+
Pre-splitting might *not* be necessary in the following cases:
3237

33-
- when does it not matter
38+
- If data insertion is not rapid, MongoDB may have enough time to split
39+
and migrate chunks without affecting performance.
3440

35-
If data insertion is not rapid, MongoDB may have enough time to split and migrate chunks without
36-
impacts on performance. In addition, if the collection already has chunks with an even key
37-
distribution, pre-splitting may not be necessary.
41+
- If the collection already has chunks with an even key distribution,
42+
pre-splitting may not be necessary.
3843

39-
See the ":doc:`/tutorial/inserting-documents-into-a-sharded-collection`" tutorial for more
40-
information.
44+
For more information, see :doc:`/tutorial/inserting-documents-into-a-sharded-collection`.
4145

42-
43-
Is it necessary to pre-split data before high volume inserts into a sharded collection?
44-
---------------------------------------------------------------------------------------
45-
46-
The answer depends on the shard key, the existing distribution of chunks, and how
47-
evenly distributed the insert operation is. If a collection is empty prior to a
48-
bulk insert, the database will take time to determine the optimal key
49-
distribution. Predefining splits improves write performance in the early stages
50-
of a bulk import.
51-
52-
Pre-splitting is also important if the write operation isn't evenly distributed.
53-
When using an increasing shard key, for example, pre-splitting data can prevent
54-
writes from hammering a single shard.
46+
.. SK, I flipped the above sentence, which could instead read:
47+
.. See :doc:`/tutorial/inserting-documents-into-a-sharded-collection` for more information.
48+
.. I prefer the former, but I think you prefer the latter. Let me know. -BG

0 commit comments

Comments
 (0)