From 5cdac48c260f9320e2035f37fa616805f534c053 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 19 Dec 2012 14:52:36 -0500 Subject: [PATCH 1/4] DOCS-491 memory faqs --- source/faq/fundamentals.txt | 99 +++++++++++++++++++++++++++++++++++-- source/faq/storage.txt | 2 + 2 files changed, 97 insertions(+), 4 deletions(-) diff --git a/source/faq/fundamentals.txt b/source/faq/fundamentals.txt index ae951e2f343..db783bdb618 100644 --- a/source/faq/fundamentals.txt +++ b/source/faq/fundamentals.txt @@ -14,7 +14,7 @@ the :doc:`complete list of FAQs ` or post your question to the :backlinks: none :local: -What kind of Database is MongoDB? +What kind of database is MongoDB? --------------------------------- MongoDB is :term:`document`-oriented DBMS. Think of MySQL but with @@ -32,7 +32,7 @@ partitioning. MongoDB uses :term:`BSON`, a binary object format similar to, but more expressive than, :term:`JSON`. -Do MongoDB Databases Have Tables? +Do MongoDB databases have tables? --------------------------------- Instead of tables, a MongoDB database stores its data in @@ -55,7 +55,7 @@ Collections are have some important differences from RDMS tables: .. _faq-schema-free: -Do MongoDB Databases have Schemas? +Do MongoDB databases have schemas? ---------------------------------- MongoDB uses dynamic schemas. You can create collections without @@ -147,6 +147,82 @@ as it can, swapping to disk as needed. Deployments with enough memory to fit the application's working data set in RAM will achieve the best performance. +Do I need a swap space? +----------------------- + +You should always have a swap space in case you run into extreme memory +constraints, memory leaks, or another program stealing a lot of memory. +Think of the swap space as something like a steam release valve which +allows excess pressure to release without blowing the system up. + +But you *do not* need swap for routine use. Database files are +:ref:`memory-mapped ` and should +constitute most of your MongoDB memory use. Therefore, it is unlikely +that :program:`mongod` will ever use any swap space. The memory mapped +files can simply be released from memory without going to swap or can be +written back to the database files without needing to be swapped out to +disk, as they are already backed by files. + +Must my working set size fit RAM? +--------------------------------- + +Your working set should stay in memory to achieve good performance. +Otherwise lots of random disk IO's will occur, and unless you are using +SSD, this can be quite slow. + +One area to watch specifically in managing the size of your working set +is index access patterns. If you are inserting into indexes at random +locations (for example, with id's, which are effectively randomly +generated by hashes), you will continually be updating the whole index. +If instead you are able to create your id's in approximately ascending +order (for example, day concatenated with a random id), all the updates +will occur at the right side of the b-tree and the working set size for +index pages will be much smaller. + +It is fine if databases and thus virtual size are much larger than RAM. + +How can I measure working set size? +----------------------------------- + +Measuring working set size can be difficult; even if it is much smaller +than total RAM. If the database is much larger than RAM in total, all +memory will be indicated as in use for the cache. Thus you need a +different way to estimate our working set size. + +One technique is to use the +`eatmem.cpp `_. +utility, which reserves a certain amount of system memory for itself. +You can run the utility with a certain amount specified and see if the +server continues to perform well. If not, the working set is larger than +the total RAM minus the eaten RAM. The test will eject some data from +the file system cache, which might take time to page back in after the +utility is terminated. + +Running eatmem.cpp continuously with a small percentage of total RAM, +such as 20%, is a good technique to get an early warning if memory is +too low. If disk I/O activity increases significantly, terminate +eatmem.cpp to mitigate the problem for the moment until further steps +can be taken. + +In :term:`replica sets `, if one server is underpowered the +eatmem.cpp utility could help as an early warning mechanism for server +capacity. Of course, the server must be receiving representative traffic +to get an indication. + +How do I read memory statistics in the UNIX ``top`` command +----------------------------------------------------------- + +Because :program:`mongod` uses :ref:`memory-mapped files +`, the memory statistics in ``top`` +require interpretation in a special way. On a large database, ``VSIZE`` +(virtual bytes) tends to be the size of the entire database. If the +:program:`mongod` doesn't have other processes running, ``RSIZE`` +(resident bytes) is the total memory of the machine, as this counts +file system cache contents. + +The ``vmstat`` command is also useful for determining +memory use. On Macintosh computers, the command is ``vm_stat``. + How do I configure the cache size? ---------------------------------- @@ -186,6 +262,21 @@ data to application objects require joins, this process increases the overhead related to using the database which increases the importance of the caching layer. +How do I calculate how much RAM I need for my application? +---------------------------------------------------------- + +The amount of RAM you need depends on several factors: + +- The relationship between database storage and working set + +- The operating system's cache strategy for LRU (Least Recently Used) + +- The impact of journaling + +- Using page faults and other MMS gauges to detect when you need more RAM + +.. todo This topic is a work in progress + Does MongoDB handle caching? ---------------------------- @@ -208,7 +299,7 @@ drivers use C extensions for better performance. What are the limitations of 32-bit versions of MongoDB? ------------------------------------------------------- -MongoDB uses memory-mapped files. When running a 32-bit build of +MongoDB uses :ref:`memory-mapped files `. When running a 32-bit build of MongoDB, the total storage size for the server, including data and indexes, is 2 gigabytes. For this reason, do not deploy MongoDB to production on 32-bit machines. diff --git a/source/faq/storage.txt b/source/faq/storage.txt index 4975b568fb7..efcb237114b 100644 --- a/source/faq/storage.txt +++ b/source/faq/storage.txt @@ -15,6 +15,8 @@ the :doc:`complete list of FAQs ` or post your question to the :backlinks: none :local: +.. _faq-storage-memory-mapped-files: + What are memory mapped files? ----------------------------- From e54bc217a070102aaf517e046ed96c4095926e94 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 19 Dec 2012 15:24:39 -0500 Subject: [PATCH 2/4] DOCS-491 first draft --- source/faq/fundamentals.txt | 85 ++++++++++++++++++++++++------------- 1 file changed, 56 insertions(+), 29 deletions(-) diff --git a/source/faq/fundamentals.txt b/source/faq/fundamentals.txt index db783bdb618..b41877f1a90 100644 --- a/source/faq/fundamentals.txt +++ b/source/faq/fundamentals.txt @@ -163,6 +163,8 @@ files can simply be released from memory without going to swap or can be written back to the database files without needing to be swapped out to disk, as they are already backed by files. +.. _faq-fundamentals-working-set: + Must my working set size fit RAM? --------------------------------- @@ -181,6 +183,8 @@ index pages will be much smaller. It is fine if databases and thus virtual size are much larger than RAM. +.. _faq-fundamentals-working-set-size: + How can I measure working set size? ----------------------------------- @@ -204,11 +208,49 @@ too low. If disk I/O activity increases significantly, terminate eatmem.cpp to mitigate the problem for the moment until further steps can be taken. -In :term:`replica sets `, if one server is underpowered the +In :term:`replica sets `, if one server is underpowered the eatmem.cpp utility could help as an early warning mechanism for server capacity. Of course, the server must be receiving representative traffic to get an indication. +How do I calculate how much RAM I need for my application? +---------------------------------------------------------- + +.. todo Improve this FAQ + +The amount of RAM you need depends on several factors, including but not +limited to: + +- The relationship between :ref:`database storage ` and working set. + +- The operating system's cache strategy for LRU (Least Recently Used) + +- The impact of :doc:`journaling ` + +- Using page faults and other MMS gauges to detect when you need more RAM + +MongoDB makes no choices regarding what data is loaded into memory from +disk. It simply :ref:`memory maps ` all +its data files and relies on the operating system to cache data. The OS +typically evicts the least-recently-used data from RAM when it runs low +on memory. For example if indexes are accessed more frequently then +documents then indexes will more likely stay in RAM, but it depends on +your particular usage. + +To calculate how much RAM you need, you must calculate your working set +size, i.e., the portion of your data that is frequently accessed. This +depends on your access patterns, what indexes you have, and the size of +your documents. To calculate working set size, see: + +- :ref:`faq-fundamentals-working-set` + +- :ref:`faq-fundamentals-working-set-size` + +If page faults are low (for example, below than 100 / sec), then your +working set fits in RAM. If fault rates rise higher than that, you risk +performance degradation. This is less critical with SSD drives than +spinning disks. + How do I read memory statistics in the UNIX ``top`` command ----------------------------------------------------------- @@ -230,19 +272,6 @@ MongoDB has no configurable cache. MongoDB uses all *free* memory on the system automatically by way of memory-mapped files. Operating systems use the same approach with their file system caches. -Are writes written to disk immediately, or lazily? --------------------------------------------------- - -Writes are physically written to the journal within 100 -milliseconds. At that point, the write is "durable" in the sense that -after a pull-plug-from-wall event, the data will still be recoverable after -a hard restart. - -While the journal commit is nearly instant, MongoDB writes to the data -files lazily. MongoDB may wait to write data to the data files for as -much as one minute. This does not affect durability, as the journal -has enough information to ensure crash recovery. - .. _faq-database-and-caching: Does MongoDB require a separate caching layer for application-level caching? @@ -262,21 +291,6 @@ data to application objects require joins, this process increases the overhead related to using the database which increases the importance of the caching layer. -How do I calculate how much RAM I need for my application? ----------------------------------------------------------- - -The amount of RAM you need depends on several factors: - -- The relationship between database storage and working set - -- The operating system's cache strategy for LRU (Least Recently Used) - -- The impact of journaling - -- Using page faults and other MMS gauges to detect when you need more RAM - -.. todo This topic is a work in progress - Does MongoDB handle caching? ---------------------------- @@ -287,6 +301,19 @@ in RAM, MongoDB serves all queries from memory. MongoDB does not implement a query cache: MongoDB serves all queries directly from the indexes and/or data files. +Are writes written to disk immediately, or lazily? +-------------------------------------------------- + +Writes are physically written to the journal within 100 +milliseconds. At that point, the write is "durable" in the sense that +after a pull-plug-from-wall event, the data will still be recoverable after +a hard restart. + +While the journal commit is nearly instant, MongoDB writes to the data +files lazily. MongoDB may wait to write data to the data files for as +much as one minute. This does not affect durability, as the journal +has enough information to ensure crash recovery. + What language is MongoDB written in? ------------------------------------ From 529f32c9a2f0c9a8f066873e2e4fb521f9020689 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Thu, 20 Dec 2012 18:03:53 -0500 Subject: [PATCH 3/4] DOCS-491 review edits --- source/faq/fundamentals.txt | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/source/faq/fundamentals.txt b/source/faq/fundamentals.txt index b41877f1a90..858ba124f76 100644 --- a/source/faq/fundamentals.txt +++ b/source/faq/fundamentals.txt @@ -169,7 +169,7 @@ Must my working set size fit RAM? --------------------------------- Your working set should stay in memory to achieve good performance. -Otherwise lots of random disk IO's will occur, and unless you are using +Otherwise many random disk IO's will occur, and unless you are using SSD, this can be quite slow. One area to watch specifically in managing the size of your working set @@ -191,14 +191,14 @@ How can I measure working set size? Measuring working set size can be difficult; even if it is much smaller than total RAM. If the database is much larger than RAM in total, all memory will be indicated as in use for the cache. Thus you need a -different way to estimate our working set size. +different way to estimate the working set size. One technique is to use the `eatmem.cpp `_. utility, which reserves a certain amount of system memory for itself. You can run the utility with a certain amount specified and see if the server continues to perform well. If not, the working set is larger than -the total RAM minus the eaten RAM. The test will eject some data from +the total RAM minus the consumed RAM. The test will eject some data from the file system cache, which might take time to page back in after the utility is terminated. @@ -227,13 +227,14 @@ limited to: - The impact of :doc:`journaling ` -- Using page faults and other MMS gauges to detect when you need more RAM +- The number or rate of page faults and other MMS gauges to detect when + you need more RAM MongoDB makes no choices regarding what data is loaded into memory from disk. It simply :ref:`memory maps ` all its data files and relies on the operating system to cache data. The OS typically evicts the least-recently-used data from RAM when it runs low -on memory. For example if indexes are accessed more frequently then +on memory. For example if indexes are accessed more frequently than documents then indexes will more likely stay in RAM, but it depends on your particular usage. @@ -246,10 +247,10 @@ your documents. To calculate working set size, see: - :ref:`faq-fundamentals-working-set-size` -If page faults are low (for example, below than 100 / sec), then your +If page faults are low (for example, below than 100 / sec), your working set fits in RAM. If fault rates rise higher than that, you risk performance degradation. This is less critical with SSD drives than -spinning disks. +with spinning disks. How do I read memory statistics in the UNIX ``top`` command ----------------------------------------------------------- From 54b54d64d325bd827d93b6a98c5b6687ef798eda Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Fri, 21 Dec 2012 11:28:38 -0500 Subject: [PATCH 4/4] DOCS-491 review edits --- source/faq/fundamentals.txt | 81 ++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 42 deletions(-) diff --git a/source/faq/fundamentals.txt b/source/faq/fundamentals.txt index 858ba124f76..9bdaf5a6e7e 100644 --- a/source/faq/fundamentals.txt +++ b/source/faq/fundamentals.txt @@ -43,13 +43,11 @@ database table, and each document has one or more fields, which corresponds to a column in a relational database table. -Collections are have some important differences from RDMS tables: - -- Documents in a single collection may have unique combination and set - of fields. Documents need not have identical fields. - -- You can add a field to some documents in a collection without adding - that field to all documents in the collection. +Collections have important differences from RDMS tables. Documents in a +single collection may have a unique combination and set of fields. +Documents need not have identical fields. You can add a field to some +documents in a collection without adding that field to all documents in +the collection. .. see:: :doc:`/reference/sql-comparison` @@ -174,7 +172,7 @@ SSD, this can be quite slow. One area to watch specifically in managing the size of your working set is index access patterns. If you are inserting into indexes at random -locations (for example, with id's, which are effectively randomly +locations (as would happen with id's that are randomly generated by hashes), you will continually be updating the whole index. If instead you are able to create your id's in approximately ascending order (for example, day concatenated with a random id), all the updates @@ -183,35 +181,37 @@ index pages will be much smaller. It is fine if databases and thus virtual size are much larger than RAM. -.. _faq-fundamentals-working-set-size: +.. todo Commenting out for now: + + .. _faq-fundamentals-working-set-size: -How can I measure working set size? ------------------------------------ + How can I measure working set size? + ----------------------------------- -Measuring working set size can be difficult; even if it is much smaller -than total RAM. If the database is much larger than RAM in total, all -memory will be indicated as in use for the cache. Thus you need a -different way to estimate the working set size. + Measuring working set size can be difficult; even if it is much + smaller than total RAM. If the database is much larger than RAM in + total, all memory will be indicated as in use for the cache. Thus you + need a different way to estimate the working set size. -One technique is to use the -`eatmem.cpp `_. -utility, which reserves a certain amount of system memory for itself. -You can run the utility with a certain amount specified and see if the -server continues to perform well. If not, the working set is larger than -the total RAM minus the consumed RAM. The test will eject some data from -the file system cache, which might take time to page back in after the -utility is terminated. + One technique is to use the `eatmem.cpp + `_. + utility, which reserves a certain amount of system memory for itself. + You can run the utility with a certain amount specified and see if + the server continues to perform well. If not, the working set is + larger than the total RAM minus the consumed RAM. The test will eject + some data from the file system cache, which might take time to page + back in after the utility is terminated. -Running eatmem.cpp continuously with a small percentage of total RAM, -such as 20%, is a good technique to get an early warning if memory is -too low. If disk I/O activity increases significantly, terminate -eatmem.cpp to mitigate the problem for the moment until further steps -can be taken. + Running eatmem.cpp continuously with a small percentage of total RAM, + such as 20%, is a good technique to get an early warning if memory is + too low. If disk I/O activity increases significantly, terminate + eatmem.cpp to mitigate the problem for the moment until further steps + can be taken. -In :term:`replica sets `, if one server is underpowered the -eatmem.cpp utility could help as an early warning mechanism for server -capacity. Of course, the server must be receiving representative traffic -to get an indication. + In :term:`replica sets `, if one server is underpowered + the eatmem.cpp utility could help as an early warning mechanism for + server capacity. Of course, the server must be receiving + representative traffic to get an indication. How do I calculate how much RAM I need for my application? ---------------------------------------------------------- @@ -221,7 +221,7 @@ How do I calculate how much RAM I need for my application? The amount of RAM you need depends on several factors, including but not limited to: -- The relationship between :ref:`database storage ` and working set. +- The relationship between :doc:`database storage ` and working set. - The operating system's cache strategy for LRU (Least Recently Used) @@ -241,13 +241,9 @@ your particular usage. To calculate how much RAM you need, you must calculate your working set size, i.e., the portion of your data that is frequently accessed. This depends on your access patterns, what indexes you have, and the size of -your documents. To calculate working set size, see: - -- :ref:`faq-fundamentals-working-set` - -- :ref:`faq-fundamentals-working-set-size` +your documents. To calculate working set size, see :ref:`faq-fundamentals-working-set`. -If page faults are low (for example, below than 100 / sec), your +If page faults are infrequent, your working set fits in RAM. If fault rates rise higher than that, you risk performance degradation. This is less critical with SSD drives than with spinning disks. @@ -305,15 +301,16 @@ directly from the indexes and/or data files. Are writes written to disk immediately, or lazily? -------------------------------------------------- -Writes are physically written to the journal within 100 +Writes are physically written to the :doc:`journal ` within 100 milliseconds. At that point, the write is "durable" in the sense that after a pull-plug-from-wall event, the data will still be recoverable after a hard restart. While the journal commit is nearly instant, MongoDB writes to the data files lazily. MongoDB may wait to write data to the data files for as -much as one minute. This does not affect durability, as the journal -has enough information to ensure crash recovery. +much as one minute by default. This does not affect durability, as the journal +has enough information to ensure crash recovery. To change the interval +for writing to the data files, see :setting:`syncdelay`. What language is MongoDB written in? ------------------------------------