diff --git a/source/applications.txt b/source/applications.txt index 3dfd2f93165..3ad56429d05 100644 --- a/source/applications.txt +++ b/source/applications.txt @@ -31,6 +31,7 @@ The following documents outline basic application development topics: :maxdepth: 2 applications/drivers + core/data-modeling applications/database-references applications/gridfs diff --git a/source/core/data-modeling.txt b/source/core/data-modeling.txt new file mode 100644 index 00000000000..59635d476de --- /dev/null +++ b/source/core/data-modeling.txt @@ -0,0 +1,797 @@ +============= +Data Modeling +============= + +.. default-domain:: mongodb + +Overview +-------- + +Collections in MongoDB have flexible schema; they do not define nor +enforce the fields of its documents. Each document can have only the +fields that are relevant to that entity, although in practice, you +would generally choose to maintain a consistent structure across +documents in each collection. With this flexible schema, you can model +your data to reflect more closely the actual application-level entity +rather than enforce a rigid data structure. + +In MongoDB, data modeling takes into consideration not only the +inherent properties of the data entities themselves and how they relate +to each other, but also how the data is used, how the data will grow +and possibly change over time, and how the data will be maintained. +These considerations involve decisions about whether to embed data +within a single document or to reference data in different documents, +which fields to index, and whether to take advantage of rich document +features, such as arrays. + +Choosing the best data model for your application can have both huge +performance and maintenance advantages for your applications. + +This document provide some general guidelines and principles for schema +design and highlight possible data modeling options. Not all guidelines +and options may be appropriate for your specific situation. + +.. _data-modeling-decisions: + +Data Modeling Decisions +----------------------- + +Data modeling decisions involve determining how to structure the +documents to model the data effectively. The primary decision is +whether to :ref:`embed ` or to :ref:`use +references `. + +.. _data-modeling-embedding: + +Embedding +~~~~~~~~~ + +De-normalization of data involves embedding documents within other +documents. + +Operations within a document are less expensive for the server than +operations that involve multiple documents. + +In general, choose the embedded data model when: + +- you have "contains" relationships between entities. See + :ref:`data-modeling-example-one-to-one`. + +- you have one-to-many relationships where the "many" objects always + appear with or are viewed in the context of their parent documents. + See :ref:`data-modeling-example-one-to-many`. + +Embedding provides the following benefits: + +- Great for read performance + +- Single roundtrip to database to retrieve the complete object + +Keep in mind that embedding documents that have unbound growth over +time may slow write operations. Additionally, such documents may cause +their containing documents to exceed the :limit:`maximum BSON document +size `. For documents that exceed the maximum BSON +document size, see :doc:`/applications/gridfs`. + +For examples in accessing embedded documents, see +:ref:`read-operations-subdocuments`. + +.. seealso:: + + - :term:`dot notation` for information on "reaching into" embedded + sub-documents. + + - :ref:`read-operations-arrays` for more examples on accessing arrays + + - :ref:`read-operations-subdocuments` for more examples on accessing + subdocuments + +.. _data-modeling-referencing: + +Referencing +~~~~~~~~~~~ + +Normalization of data requires storing :doc:`references +` from one document to another. + +In general, choose the referenced data model when: + +- when embedding would result in duplication of data but would not + provide sufficient read performance advantages to outweigh the + implications of the duplication + +- you have many-to-many relationships. + +- you are modeling large hierarchical data. See + :ref:`data-modeling-trees`. + +Referencing provides more flexibility than embedding; however, to +resolve the references, client-side applications must issue follow-up +queries. In other words, using references requires more roundtrips to +the server. + +See :ref:`data-modeling-publisher-and-books` for an example of +referencing. + +.. _data-modeling-atomicity: + +Atomicity +~~~~~~~~~ + +Atomicity influences the decision to embed or link. The modification of +a single document is atomic, even if the write operation modifies +multiple sub-documents *within* the single document. + +Embed fields that need to be modified together atomically in the same +document. See :ref:`data-modeling-atomic-operation` for an example of +atomic updates within a single document. + +Operational Considerations +-------------------------- + +Operational considerations involve decisions related to data lifecycle +management, number of collections, indexing, sharding, and managing +document growth. These decisions can improve performance and facilitate +maintenance efforts. + +Data Lifecycle Management +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Data modeling decisions should also take data lifecycle management into +consideration. + +The :doc:`Time to Live or TTL feature ` of +collections expires documents after a period of time. Consider using +the TTL feature if your application requires some data to persist in +the database for a limited period of time. + +Additionally, if your application is concerned only with the most +recent documents, you might consider :doc:`/core/capped-collections`. +Capped collections provide *first-in-first-out* (FIFO) management of +inserted documents and support operations that insert, read, and delete +documents based on insertion order. + +Large Number of Collections +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In certain situations, you might choose to store information in several +collections instead of a single collection. + +Consider a sample collection ``logs`` that stores log documents for +various environment and applications. The ``logs`` collection contains +documents of the following form: + +.. code-block:: javascript + + { log: "dev", ts: ..., info: ... } + { log: "debug", ts: ..., info: ...} + +If the number of different logs is not high, you may decide to have +separate log collections, such as ``logs.dev`` and ``logs.debug``. The +``logs.dev`` collection would contain only the log documents related to +the dev environment. + +Generally, having large number of collections has no significant +performance penalty and results in very good performance. Independent +collections are very important for high-throughput batch processing. + +When creating large numbers of collections, consider the following +behaviors: + +- Each collection has a certain minimum overhead of a few kilobytes. + +- Each index requires at least 8KB of data space. + +Namespaces are stored per database in a ``.ns`` file. All +indexes and collections have their own entry in the namespace file, and +each namespace entry is 628 bytes. + +Because of :limit:`limits on namespaces `, you +may wish to know the current number of namespaces in order to determine +how many additional namespaces the database can support, as in the +following example: + +.. code-block:: javascript + + db.system.namespaces.count() + +The ``.ns`` file defaults to 16 MB. To change +the size of the ``.ns`` file, pass a new size to +:option:`--nssize option \ ` on server +startup. + +The :option:`--nssize ` sets the size for *new* +``.ns`` files. For existing databases, after starting up the +server with :option:`--nssize `, run the +:dbcommand:`db.repairDatabase()` command from the :program:`mongo` +shell. + +Indexes +~~~~~~~ + +As a general rule, where you want an index in a relational database, +you want an index in MongoDB. Indexes in MongoDB are needed for +efficient query processing, and as such, you may want to think about +the queries first and then build indexes based upon them. Generally, +you would index the fields that you query by and the fields that you +sort by. A unique index is automatically created on the ``_id`` field. + +As you create indexes, consider the following behaviors of indexes: + +- Each index requires at least 8KB of data space. + +- Adding an index has some negative performance impact for write + operations. For collections with high write-to-read ratio, indexes + are expensive as each insert must add keys to each index. + +- Collections with high read-to-write ratio benefit from having many + indexes. Read operations supported by the index have high + performance, and read operations not supported by the index are + unaffected by it. + +See :doc:`/applications/indexes` for more information on determining +indexes. Additionally, MongoDB :wiki:`Database Profiler` provides +information for determining if an index is needed. + +.. TODO link to new database profiler manual page once migrated + +Sharding +~~~~~~~~ + +:term:`Sharding ` allows users to :term:`partition` a +:term:`collection` within a database to distribute the collection's +documents across a number of :program:`mongod` instances or +:term:`shards `. + +When a collection is sharded, the shard key determines how the +collection is partitioned among shards. Selecting the proper +:ref:`shard key ` can have a significant impact on +performance. + +See :doc:`/core/sharding` for more information on sharding and +the selection of the :ref:`shard key `. + +Document Growth +~~~~~~~~~~~~~~~ + +Certain updates to documents can increase the document size, such as +pushing elements to an array and adding new fields. If the document +size exceeds the allocated space for that document, MongoDB relocates +the document on disk. This internal relocation can be both time and +resource consuming. + +Although MongoDB automatically provides padding to minimize the +occurrence of relocations, you may still need to manually handle +document growth. Refer to :doc:`/use-cases/pre-aggregated-reports` for +an example of the *Pre-allocation* approach to handle document growth. + +.. TODO add link to padding factor page once migrated + +Patterns and Examples +--------------------- + +.. _data-modeling-example-one-to-one: + +One-to-One: Embedding +~~~~~~~~~~~~~~~~~~~~~ + +Consider the following example that maps patron and address +relationships. The example illustrates the advantage of embedding over +referencing if you need to view one data entity in context of the +other. In this one-to-one relationship between ``patron`` and +``address`` data, the ``address`` belongs to the ``patron``. + +In the normalized data model, the ``address`` contains a reference to +the ``parent``. + +.. code-block:: javascript + + { + _id: "joe", + name: "Joe Bookreader" + } + + { + patron_id: "joe", + street: "123 Fake Street", + city: "Faketon", + state: "MA" + zip: 12345 + } + +If the ``address`` data is frequently retrieved with the ``name`` +information, then with referencing, your application needs to issue +multiple queries to resolve the reference. The better data model would +be to embed the ``address`` data in the ``patron`` data, as in the +following document: + +.. code-block:: javascript + + { + _id: "joe", + name: "Joe Bookreader", + address: { + street: "123 Fake Street", + city: "Faketon", + state: "MA" + zip: 12345 + } + } + +With the embedded data model, your application can retrieve the +complete patron information with one query. + +.. _data-modeling-example-one-to-many: + +One-to-Many: Embedding +~~~~~~~~~~~~~~~~~~~~~~ + +Consider the following example that maps patron and multiple address +relationships. The example illustrates the advantage of embedding over +referencing if you need to view many data entities in context of +another. In this one-to-many relationship between ``patron`` and +``address`` data, the ``patron`` has multiple ``address`` entities. + +In the normalized data model, the ``address`` contains a reference to +the ``parent``. + +.. code-block:: javascript + + { + _id: "joe", + name: "Joe Bookreader" + } + + { + patron_id: "joe", + street: "123 Fake Street", + city: "Faketon", + state: "MA", + zip: 12345 + } + + { + patron_id: "joe", + street: "1 Some Other Street", + city: "Boston", + state: "MA", + zip: 12345 + } + +If your application frequently retrieves the ``address`` data with the +``name`` information, then your application needs to issue multiple +queries to resolve the references. A more optimal schema would be to +embed the ``address`` data entities in the ``patron`` data, as in the +following document: + +.. code-block:: javascript + + { + _id: "joe", + name: "Joe Bookreader", + addresses: [ + { + street: "123 Fake Street", + city: "Faketon", + state: "MA", + zip: 12345 + }, + { + street: "1 Some Other Street", + city: "Boston", + state: "MA", + zip: 12345 + } + ] + } + +With the embedded data model, your application can retrieve the +complete patron information with one query. + +.. _data-modeling-publisher-and-books: + +One-to-Many: Referencing +```````````````````````` + +Consider the following example that maps publisher and book +relationships. The example illustrates the advantage of referencing +over embedding to avoid repetition of the publisher information. + +Embedding the publisher document inside the book document would lead to +**repetition** of the publisher data, as the following documents show: + +.. code-block:: javascript + :emphasize-lines: 7-11,20-24 + + { + title: "MongoDB: The Definitive Guide", + author: [ "Kristina Chodorow", "Mike Dirolf" ], + published_date: ISODate("2010-09-24"), + pages: 216, + language: "English", + publisher: { + name: "O'Reilly Media", + founded: 1980, + location: "CA" + } + } + + { + title: "50 Tips and Tricks for MongoDB Developer", + author: "Kristina Chodorow", + published_date: ISODate("2011-05-06"), + pages: 68, + language: "English", + publisher: { + name: "O'Reilly Media", + founded: 1980, + location: "CA" + } + } + +To avoid repetition of the publisher data, use *references* and keep +the publisher information in a separate collection from the book +collection. + +When using references, the growth of the relationships determine where +to store the reference. If the number of books per publisher is small +with limited growth, storing the book reference inside the publisher +document may sometimes be useful. Otherwise, if the number of books per +publisher is unbounded, this data model would lead to mutable, growing +arrays, as in the following example: + +.. code-block:: javascript + :emphasize-lines: 5 + + { + name: "O'Reilly Media", + founded: 1980, + location: "CA", + books: [12346789, 234567890, ...] + } + + { + _id: 123456789, + title: "MongoDB: The Definitive Guide", + author: [ "Kristina Chodorow", "Mike Dirolf" ], + published_date: ISODate("2010-09-24"), + pages: 216, + language: "English" + } + + { + _id: 234567890, + title: "50 Tips and Tricks for MongoDB Developer", + author: "Kristina Chodorow", + published_date: ISODate("2011-05-06"), + pages: 68, + language: "English" + } + +To avoid mutable, growing arrays, store the publisher reference inside +the book document: + +.. code-block:: javascript + :emphasize-lines: 15, 25 + + { + _id: "oreilly", + name: "O'Reilly Media", + founded: 1980, + location: "CA" + } + + { + _id: 123456789, + title: "MongoDB: The Definitive Guide", + author: [ "Kristina Chodorow", "Mike Dirolf" ], + published_date: ISODate("2010-09-24"), + pages: 216, + language: "English", + publisher_id: "oreilly" + } + + { + _id: 234567890, + title: "50 Tips and Tricks for MongoDB Developer", + author: "Kristina Chodorow", + published_date: ISODate("2011-05-06"), + pages: 68, + language: "English", + publisher_id: "oreilly" + } + +.. Reworked the Queue slide from the presentation to Atomic Operation +.. TODO later, include a separate queue example for maybe checkout requests, + and possibly bucket example that is separate from the pre-allocation + example link above in the Document Growth section + +.. _data-modeling-atomic-operation: + +Atomic Operation +~~~~~~~~~~~~~~~~ + +Consider the following example that keeps a library book and its +checkout information. The example illustrates how embedding fields +related to an atomic update within the same document ensures that the +fields are in sync. + +Consider the following ``book`` document that stores the number of +available copies for checkout and the current checkout information: + +.. code-block:: javascript + :emphasize-lines: 9 + + book = { + _id: 123456789, + title: "MongoDB: The Definitive Guide", + author: [ "Kristina Chodorow", "Mike Dirolf" ], + published_date: ISODate("2010-09-24"), + pages: 216, + language: "English", + publisher_id: "oreilly", + available: 3, + checkout: [ { by: "joe", date: ISODate("2012-10-15") } ] + } + +You can use the :method:`db.collection.findAndModify()` method to +atomically determine if a book is available for checkout and update +with the new checkout information. Embedding the ``available`` field +and the ``checkout`` field within the same document ensures that the +updates to these fields are in sync: + +.. code-block:: javascript + + db.books.findAndModify ( { + query: { + _id: 123456789, + available: { $gt: 0 } + }, + update: { + $inc: { available: -1 }, + $push: { checkout: { by: "abc", date: new Date() } } + } + } ) + +.. _data-modeling-trees: + +Trees +~~~~~ + +To model hierarchical or nested data relationships, you can use +references to implement tree-like structures. The following *Tree* +pattern examples model book categories that have hierarchical +relationships. + +Parent References +````````````````` + +The *Parent References* pattern stores each tree node in a document; in +addition to the tree node, the document stores the id of the node's +parent. + +Consider the following example that models a tree of categories using +*Parent References*: + +.. code-block:: javascript + + db.categories.insert( { _id: "MongoDB", parent: "Databases" } ) + db.categories.insert( { _id: "Postgres", parent: "Databases" } ) + db.categories.insert( { _id: "Databases", parent: "Programming" } ) + db.categories.insert( { _id: "Languages", parent: "Programming" } ) + db.categories.insert( { _id: "Programming", parent: "Books" } ) + db.categories.insert( { _id: "Books", parent: null } ) + +- The query to retrieve the parent of a node is fast and + straightforward: + + .. code-block:: javascript + + db.categories.findOne( { _id: "MongoDB" } ).parent + +- You can create an index on the field ``parent`` to enable fast search + by the parent node: + + .. code-block:: javascript + + db.categories.ensureIndex( { parent: 1 } ) + +- You can query by the ``parent`` field to find its immediate children + nodes: + + .. code-block:: javascript + + db.categories.find( { parent: "Databases" } ) + +The *Parent Links* pattern provides a simple solution to tree storage, +but requires successive queries to the database to retrieve subtrees. + +Child References +````````````````` + +The *Child References* pattern stores each tree node in a document; in +addition to the tree node, document stores in an array the id(s) of the +node's children. + +Consider the following example that models a tree of categories using +*Child References*: + +.. code-block:: javascript + + db.categories.insert( { _id: "MongoDB", children: [] } ) + db.categories.insert( { _id: "Postgres", children: [] } ) + db.categories.insert( { _id: "Databases", children: [ "MongoDB", "Postgres" ] } ) + db.categories.insert( { _id: "Languages", children: [] } ) + db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } ) + db.categories.insert( { _id: "Books", children: [ "Programming" ] } ) + +- The query to retrieve the immediate children of a node is fast and + straightforward: + + .. code-block:: javascript + + db.categories.findOne( { _id: "Databases" } ).children + +- You can create an index on the field ``children`` to enable fast + search by the child nodes: + + .. code-block:: javascript + + db.categories.ensureIndex( { children: 1 } ) + +- You can query for a node in the ``children`` field to find its parent + node as well as its siblings: + + .. code-block:: javascript + + db.categories.find( { children: "MongoDB" } ) + +The *Child References* pattern provides a suitable solution to tree storage +as long as no operations on subtrees are necessary. This pattern may +also provide a suitable solution for storing graphs where a node may +have multiple parents. + +Array of Ancestors +`````````````````` + +The *Array of Ancestors* pattern stores each tree node in a document; +in addition to the tree node, document stores in an array the id(s) of +the node's ancestors or path. + +Consider the following example that models a tree of categories using +*Array of Ancestors*: + +.. code-block:: javascript + + db.categories.insert( { _id: "MongoDB", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } ) + db.categories.insert( { _id: "Postgres", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" } ) + db.categories.insert( { _id: "Databases", ancestors: [ "Books", "Programming" ], parent: "Programming" } ) + db.categories.insert( { _id: "Languages", ancestors: [ "Books", "Programming" ], parent: "Programming" } ) + db.categories.insert( { _id: "Programming", ancestors: [ "Books" ], parent: "Books" } ) + db.categories.insert( { _id: "Books", ancestors: [ ], parent: null } ) + +- The query to retrieve the ancestors or path of a node is fast and + straightforward: + + .. code-block:: javascript + + db.categories.findOne( { _id: "MongoDB" } ).ancestors + +- You can create an index on the field ``ancestors`` to enable fast + search by the ancestors nodes: + + .. code-block:: javascript + + db.categories.ensureIndex( { ancestors: 1 } ) + +- You can query by the ``ancestors`` to find all its descendants: + + .. code-block:: javascript + + db.categories.find( { ancestors: "Programming" } ) + +The *Array of Ancestors* pattern provides a fast and efficient solution +to find the descendants and the ancestors of a node by creating an +index on the elements of the ancestors field. This makes *Array of +Ancestors* a good choice for working with subtrees. + +The *Array of Ancestors* pattern is slightly slower than the +*Materialized Paths* pattern but is more straightforward to use. + +Materialized Paths +`````````````````` + +The *Materialized Paths* pattern stores each tree node in a document; +in addition to the tree node, document stores as a string the id(s) of +the node's ancestors or path. Although the *Materialized Paths* pattern +requires additional steps of working with strings and regular +expressions, the pattern also provides more flexibility in working with +the path, such as finding nodes by partial paths. + +Consider the following example that models a tree of categories using +*Materialized Paths* ; the path string uses the comma ``,`` as a +delimiter: + +.. code-block:: javascript + + db.categories.insert( { _id: "Books", path: null } ) + db.categories.insert( { _id: "Programming", path: "Books," } ) + db.categories.insert( { _id: "Databases", path: "Books,Programming," } ) + db.categories.insert( { _id: "Languages", path: "Books,Programming," } ) + db.categories.insert( { _id: "MongoDB", path: "Books,Programming,Databases," } ) + db.categories.insert( { _id: "Postgres", path: "Books,Programming,Databases," } ) + +- You can query to retrieve the whole tree, sorting by the ``path``: + + .. code-block:: javascript + + db.categories.find().sort( { path: 1 } ) + +- You can create an index on the field ``path`` to enable fast search + by the path: + + .. code-block:: javascript + + db.categories.ensureIndex( { path: 1 } ) + +- You can use regular expressions on the ``path`` field to find the + descendants of ``Programming``: + + .. code-block:: javascript + + db.categories.find( { path: /,Programming,/ } ) + +Nested Sets +``````````` + +The *Nested Sets* pattern identifies each node in the tree as stops in +a round-trip traversal of the tree. Each node is visited twice; first +during the initial trip, and second during the return trip. The *Nested +Sets* pattern stores each tree node in a document; in addition to the +tree node, document stores the id of node's parent, the node's initial +stop in the ``left`` field, and its return stop in the ``right`` field. + +Consider the following example that models a tree of categories using +*Nested Sets*: + +.. code-block:: javascript + + db.categories.insert( { _id: "Books", parent: 0, left: 1, right: 12 } ) + db.categories.insert( { _id: "Programming", parent: "Books", left: 2, right: 11 } ) + db.categories.insert( { _id: "Languages", parent: "Programming", left: 3, right: 4 } ) + db.categories.insert( { _id: "Databases", parent: "Programming", left: 5, right: 10 } ) + db.categories.insert( { _id: "MongoDB", parent: "Databases", left: 6, right: 7 } ) + db.categories.insert( { _id: "Postgres", parent: "Databases", left: 8, right: 9 } ) + +You can query to retrieve the descendants of a node: + +.. code-block:: javascript + + var databaseCategory = db.v.findOne( { _id: "Databases" } ); + db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } } ); + +The *Nested Sets* pattern provides a fast and efficient solution for +finding subtrees but is inefficient for modifying the tree structure. +As such, this pattern is best for static trees that do not change. + +Additional Resources +-------------------- + +For more information, consider the following external resources: + +- `Schema Design by Example `_ + +- `Walkthrough MongoDB Data Modeling `_ + +- `Document Design for MongoDB `_ + +- `Dynamic Schema Blog Post `_ + +- :wiki:`MongoDB Data Modeling and Rails` + +- `Ruby Example of Materialized Paths + `_ + +- `Sean Cribs Blog Post + `_ + which was the source for much of the :ref:`data-modeling-trees` content. diff --git a/source/faq/developers.txt b/source/faq/developers.txt index b784d9736e8..24585fc82b3 100644 --- a/source/faq/developers.txt +++ b/source/faq/developers.txt @@ -615,3 +615,48 @@ explicitly force the query to use that index. .. [#id-is-immutable] MongoDB does not permit changes to the value of the ``_id`` field; it is not possible for a cursor that transverses this index to pass the same document more than once. + +.. _faq-developers-embed-documents: + +When should I embed documents within other documents? +----------------------------------------------------- + +When :doc:`modeling data in MongoDB `, embedding +is frequently the choice for: + +- "contains" relationships between entities. + +- one-to-many relationships when the "many" objects *always* appear + with or are viewed in the context of their parents. + +You should also consider embedding for performance reasons if you have +a collection with a large number of small documents. Nevertheless, if +small, separate documents represent the natural model for the data, +then you should maintain that model. + +If, however, you can group these small documents by some logical +relationship *and* you frequently retrieve the documents by this +grouping, you might consider "rolling-up" the small documents into +larger documents that contain an array of subdocuments. Keep in mind +that if you often only need to retrieve a subset of the documents +within the group, then "rolling-up" the documents may not provide +better performance. + +"Rolling up" these small documents into logical groupings means that queries to +retrieve a group of documents involve sequential reads and fewer random disk +accesses. + +.. Will probably need to break up the following sentence: + +Additionally, "rolling up" documents and moving common fields to the +larger document benefit the index on these fields. There would be fewer +copies of the common fields *and* there would be fewer associated key +entries in the corresponding index. See :doc:`/core/indexes` for more +information on indexes. + +.. Commenting out.. If the data is too large to fit entirely in RAM, + embedding provides better RAM cache utilization. + +.. Commenting out.. If your small documents are approximately the page + cache unit size, there is no benefit for ram cache efficiency, although + embedding will provide some benefit regarding random disk i/o. diff --git a/source/reference/configuration-options.txt b/source/reference/configuration-options.txt index 5cc5dde58ac..69900c254b2 100644 --- a/source/reference/configuration-options.txt +++ b/source/reference/configuration-options.txt @@ -440,14 +440,13 @@ Settings *Default:* 16 - Specify this value in megabytes. + Specify this value in megabytes. The maximum size is 2047 megabytes. Use this setting to control the default size for all newly created namespace files (i.e ``.ns``). This option has no impact on the size of existing namespace files. - The default value is 16 megabytes, this provides for effectively - 12,000 possible namespace. The maximum size is 2 gigabytes. + See :limit:`Limits on namespaces `. .. setting:: profile diff --git a/source/reference/limits.txt b/source/reference/limits.txt index f825fc9fe75..6a7cac98281 100644 --- a/source/reference/limits.txt +++ b/source/reference/limits.txt @@ -46,12 +46,13 @@ Namespaces The limitation on the number of namespaces is the size of the namespace file divided by 628. - A 16 megabyte namespace file can support approximately 24,000 namespaces. + A 16 megabyte namespace file can support approximately 24,000 + namespaces. Each index also counts as a namespace. .. _limit-size-of-namespace-file: .. limit:: Size of Namespace File - Namespace files can be no larger than 2 gigabytes. + Namespace files can be no larger than 2047 megabytes. By default namespace files are 16 megabytes. You can configure the size using the :setting:`nssize`. diff --git a/source/reference/mongod.txt b/source/reference/mongod.txt index e267694bc74..5d3c87892e1 100644 --- a/source/reference/mongod.txt +++ b/source/reference/mongod.txt @@ -310,11 +310,13 @@ Options .. option:: --nssize - Specifies the default value for namespace files (i.e ``.ns``). This - option has no impact on the size of existing namespace files. + Specifies the default size for namespace files (i.e ``.ns``). This + option has no impact on the size of existing namespace files. The + maximum size is 2047 megabytes. - The default value is 16 megabytes, this provides for effectively - 12,000 possible namespaces. The maximum size is 2 gigabytes. + The default value is 16 megabytes; this provides for approximately + 24,000 namespaces. Each collection, as well as each index, counts as + a namespace. .. option:: --profile