@@ -10,23 +10,26 @@ Overview
10
10
Collections in MongoDB have flexible schema; they do not define nor
11
11
enforce the fields of its documents. Each document can have only the
12
12
fields that are relevant to that entity, although in practice, you
13
- would generally choose to store similar documents in each collection.
14
- With this flexible schema, you can model your data to reflect more
15
- closely the actual entity rather than enforce a rigid data structure.
16
-
17
- In MongoDB, data modeling takes into consideration not only how data
18
- relates to each other, but also how the data is used, how the data will
19
- grow and be maintained. These considerations involve decisions about
20
- whether to embed data within a single document or reference data among
21
- different documents, which fields to index, and whether to use special
22
- features.
23
-
24
- Choosing the correct data model can provide both performance and
25
- maintenance gains for your applications.
26
-
27
- This document provide some general guidelines for data modeling and
28
- possible options. These guidelines and options may not be appropriate
29
- for your situation.
13
+ would generally choose to maintain a consistent structure across
14
+ documents in each collection. With this flexible schema, you can model
15
+ your data to reflect more closely the actual application-level entity
16
+ rather than enforce a rigid data structure.
17
+
18
+ In MongoDB, data modeling takes into consideration not only the
19
+ inherent properties of the data entities themselves and how they relate
20
+ to each other, but also how the data is used, how the data will grow
21
+ and possibly change over time, and how the data will be maintained.
22
+ These considerations involve decisions about whether to embed data
23
+ within a single document or to reference data in different documents,
24
+ which fields to index, and whether to take advantage of rich document
25
+ features, such as arrays.
26
+
27
+ Choosing the best data model for your application can have both huge
28
+ performance and maintenance advantages for your applications.
29
+
30
+ This document provide some general guidelines and principles for schema
31
+ design and highlight possible data modeling options. Not all guidelines
32
+ and options may be appropriate for your specific situation.
30
33
31
34
.. _data-modeling-decisions:
32
35
@@ -46,28 +49,29 @@ Embedding
46
49
De-normalization of data involves embedding documents within other
47
50
documents.
48
51
49
- Operations within a document are easy for the server to handle.
52
+ Operations within a document are less expensive for the server than
53
+ operations that involve multiple documents.
50
54
51
55
In general, choose the embedded data model when:
52
56
53
57
- you have "contains" relationships between entities. See
54
58
:ref:`data-modeling-example-one-to-one`.
55
59
56
60
- you have one-to-many relationships where the "many" objects always
57
- appear with or are viewed in the context of their parents. See
58
- :ref:`data-modeling-example-one-to-many`.
61
+ appear with or are viewed in the context of their parent documents.
62
+ See :ref:`data-modeling-example-one-to-many`.
59
63
60
64
Embedding provides the following benefits:
61
65
62
66
- Great for read performance
63
67
64
68
- Single roundtrip to database to retrieve the complete object
65
69
66
- However, with embedding, write operations can be slow if you are adding
67
- objects frequently . Additionally, you cannot embed documents that will
68
- cause the containing document to exceed the :limit:`maximum BSON
69
- document size <BSON Document Size>`. For documents that exceed the
70
- maximum BSON document size, see :doc:`/applications/gridfs`.
70
+ Keep in mind that embedding documents that have unbound growth over
71
+ time may slow write operations . Additionally, such documents may cause
72
+ their containing documents to exceed the :limit:`maximum BSON document
73
+ size <BSON Document Size>`. For documents that exceed the maximum BSON
74
+ document size, see :doc:`/applications/gridfs`.
71
75
72
76
For examples in accessing embedded documents, see
73
77
:ref:`read-operations-subdocuments`.
@@ -92,17 +96,19 @@ Normalization of data requires storing :doc:`references
92
96
93
97
In general, choose the referenced data model when:
94
98
95
- - embedding would result in duplication of data.
96
-
99
+ - when embedding would result in duplication of data but would not
100
+ provide sufficient read performance advantages to outweigh the
101
+ implications of the duplication
102
+
97
103
- you have many-to-many relationships.
98
104
99
105
- you are modeling large hierarchical data. See
100
106
:ref:`data-modeling-trees`.
101
107
102
108
Referencing provides more flexibility than embedding; however, to
103
109
resolve the references, client-side applications must issue follow-up
104
- queries. Additionally, the referencing data model involves performing
105
- many seeks and random reads .
110
+ queries. In other words, using references requires more roundtrips to
111
+ the server .
106
112
107
113
See :ref:`data-modeling-publisher-and-books` for an example of
108
114
referencing.
@@ -131,8 +137,8 @@ maintenance efforts.
131
137
Data Lifecycle Management
132
138
~~~~~~~~~~~~~~~~~~~~~~~~~
133
139
134
- Data lifecycle management concerns contribute to the decision making
135
- process around data modeling .
140
+ Data modeling decisions should also take data lifecycle management into
141
+ consideration .
136
142
137
143
The :doc:`Time to Live or TTL feature </tutorial/expire-data>` of
138
144
collections expires documents after a period of time. Consider using
@@ -148,7 +154,7 @@ documents based on insertion order.
148
154
Large Number of Collections
149
155
~~~~~~~~~~~~~~~~~~~~~~~~~~~
150
156
151
- In certain situation , you might choose to store information in several
157
+ In certain situations , you might choose to store information in several
152
158
collections instead of a single collection.
153
159
154
160
Consider a sample collection ``logs`` that stores log documents for
@@ -208,7 +214,7 @@ you want an index in MongoDB. Indexes in MongoDB are needed for
208
214
efficient query processing, and as such, you may want to think about
209
215
the queries first and then build indexes based upon them. Generally,
210
216
you would index the fields that you query by and the fields that you
211
- sort by. The ``_id`` field is automatically indexed .
217
+ sort by. A unique index is automatically created on the ``_id`` field.
212
218
213
219
As you create indexes, consider the following behaviors of indexes:
214
220
@@ -217,11 +223,11 @@ As you create indexes, consider the following behaviors of indexes:
217
223
- Adding an index has some negative performance impact for write
218
224
operations. For collections with high write-to-read ratio, indexes
219
225
are expensive as each insert must add keys to each index.
220
-
221
- - Read operations supported by the index perform better, and read
222
- operations not supported by the index have no performance impact from
223
- the index. This allows for for collections with high read-to-write
224
- ratio to have many indexes .
226
+
227
+ - Collections with high read-to-write ratio benefit from having many
228
+ indexes. Read operations supported by the index have high
229
+ performance, and read operations not supported by the index are
230
+ unaffected by it .
225
231
226
232
See :doc:`/applications/indexes` for more information on determining
227
233
indexes. Additionally, MongoDB :wiki:`Database Profiler` provides
@@ -337,7 +343,7 @@ the ``parent``.
337
343
}
338
344
339
345
{
340
- patron_id = "joe",
346
+ patron_id: "joe",
341
347
street: "123 Fake Street",
342
348
city: "Faketon",
343
349
state: "MA",
@@ -354,7 +360,7 @@ the ``parent``.
354
360
355
361
If your application frequently retrieves the ``address`` data with the
356
362
``name`` information, then your application needs to issue multiple
357
- queries to resolve the references. The better data model would be to
363
+ queries to resolve the references. A more optimal schema would be to
358
364
embed the ``address`` data entities in the ``patron`` data, as in the
359
365
following document:
360
366
@@ -389,7 +395,7 @@ One-to-Many: Referencing
389
395
390
396
Consider the following example that maps publisher and book
391
397
relationships. The example illustrates the advantage of referencing
392
- over embedding to prevent the repetition of the publisher information.
398
+ over embedding to avoid repetition of the publisher information.
393
399
394
400
Embedding the publisher document inside the book document would lead to
395
401
**repetition** of the publisher data, as the following documents show:
0 commit comments