diff --git a/source/administration/journaling.txt b/source/administration/journaling.txt new file mode 100644 index 00000000000..5ff628ee8ef --- /dev/null +++ b/source/administration/journaling.txt @@ -0,0 +1,292 @@ +========== +Journaling +========== + +.. default-domain:: mongodb + +:term:`Journaling ` ensures durability of data by storing +:doc:`write operations ` in an on-disk +journal prior to applying them to the data files. The journal +ensures write operations can be re-applied in the event of a crash. +Journaling is also referred to as "write ahead logging." + +Journaling ensures that :program:`mongodb` is crash resilient. *Without* +a journal, if :program:`mongodb` exits unexpectedly, you must assume +your data is in an inconsistent state and must either run +:doc:`repair ` +or :ref:`resync ` from a clean member +of the replica set. + +With journaling, if :program:`mongodb` stops unexpectedly, the program +can recover everything written to the journal, and the data is in a +consistent state. By default, the greatest extent of lost writes, i.e., +those not made to the journal, is no more than the last 100 +milliseconds. + +With journaling, if you want a data set to reside entirely in RAM, you +need enough RAM to hold the dataset plus the "write working set." The +"write working set" is the amount of unique data you expect to see +written between re-mappings of the private view. For information on +views, see :ref:`journaling-storage-views`. + +.. versionchanged:: 2.0 + Journaling is enabled by default for 64-bit platforms. + For other platforms, see :setting:`journal`. + +Configuration and Setup +----------------------- + +Enable Journaling +~~~~~~~~~~~~~~~~~ + +.. versionchanged:: 2.0 + Journaling is enabled by default for 64-bit platforms. + +To enable journaling, start :program:`mongod` with the +:option:`--journal` command line option. + +If the :program:`mongod` process preallocates the files, the process +delays listening on port 27017 until preallocation completes, which can +take a few minutes. Your applications and the shell will not be able to +connect to the database until the process completes. + +Disable Journaling +~~~~~~~~~~~~~~~~~~ + +.. warning:: + + Do not disable journaling on production systems. If your MongoDB + system stops unexpectedly from a power failure or other condition, + and if you are not running with journaling, then you must recover + from an unaffected :term:`replica set` member or backup, as described + in :doc:`repair + `. + +To disable journaling, start :program:`mongod` with the +:option:`--nojournal ` command line option. + +To disable journaling, shut down :program:`mongod` cleanly and restart +with :option:`--nojournal `. + +Get Commit Acknowledgement +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can get commit acknowledgement with the +:dbcommand:`getLastError` command and the ``j`` option. For details, see +:ref:`write-concern-operation`. + +.. _journaling-avoid-preallocation-lag: + +Avoid Preallocation Lag +~~~~~~~~~~~~~~~~~~~~~~~ + +To avoid :ref:`preallocation lag `, you can +preallocate files in the journal directory by copying them from another +instance of :program:`mongod`. + +Preallocated files do not contain data. It is safe to later remove them. +But if you restart :program:`mongod` with journaling, :program:`mongod` +will create them again. + +.. example:: The following sequence preallocates journal files for an + instance of :program:`mongod` running on port ``27017`` with a database + path of ``/data/db``. + + For demonstration purposes, the sequence starts by creating a set of + journal files in the usual way. + + 1. Create a temporary directory into which to create a set of journal + files: + + .. code-block:: sh + + mkdir ~/tmpDbpath + + #. Create a set of journal files by staring a :program:`mongod` + instance that uses the temporary directory: + + .. code-block:: sh + + mongod --port 10000 --dbpath ~/tmpDbpath --journal + + #. When you see the following log output, indicating :program:`mongod` has the files, + press CONTROL+C to stop the :program:`mongod` instance: + + .. code-block:: sh + + web admin interface listening on port 11000 + + #. Preallocate journal files for the new instance of + :program:`mongod` by moving the journal files from the data directory + of the existing instance to the data directory of the new instance: + + .. code-block:: sh + + mv ~/tmpDbpath/journal /data/db/ + + #. Start the new :program:`mongod` instance: + + .. code-block:: sh + + mongod --port 27017 --dbpath /data/db --journal + +Monitor Journal Status +~~~~~~~~~~~~~~~~~~~~~~ + +Use the following commands and methods to monitor journal status: + +- :dbcommand:`serverStatus` + + The :dbcommand:`serverStatus` command returns database status + information that is useful for assessing performance. + +- :dbcommand:`journalLatencyTest` + + Use :dbcommand:`journalLatencyTest` to measure how long it takes on + your volume to write to the disk in an append-only fashion. You can + run this command on an idle system to get a baseline sync time for + journaling. You can also run this command on a busy system to see the + sync time on a busy system, which may be higher if the journal + directory is on the same volume as the data files. + + The :dbcommand:`journalLatencyTest` command also provides a way to + check if your disk drive is buffering writes in its local cache. If + the number is very low (i.e., less than 2 milliseconds) and the drive + is non-SSD, the drive is probably buffering writes. In that case, + enable cache write-through for the device in your operating system, + unless you have a disk controller card with battery backed RAM. + +.. _journaling-journal-commit-interval: + +Change the Group Commit Interval +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. versionchanged:: 2.0 + +You can set the group commit interval using the +:option:`--journalCommitInterval ` +command line option. The allowed range is ``2`` to ``300`` milliseconds. + +Lower values increase the durability of the journal at the expense of +disk performance. + +Recover Data After Unexpected Shutdown +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +On a restart after a crash, journal files in journal are replayed +before the server goes online. This is indicated in the log output. +You do not need to run a repair. + +Journaling Internals +-------------------- + +When running with journaling, MongoDB stores and applies :doc:`write +operations ` in memory and in the journal before +the changes are in the data files. + +.. _journaling-journal-files: + +Journal Files +~~~~~~~~~~~~~ + +With journaling enabled, MongoDB creates a journal directory within +your database directory. The journal directory holds journal files, +which contain write-ahead redo logs. The directory also holds a +last-sequence-number file. A clean shutdown removes all the files in the +journal directory. + +Journal files are append-only files and are named with the ``j._`` +prefix. When a journal file reaches 1 gigabyte, a new file is created. +Files that no longer are needed are automatically deleted. Unless you +write many bytes of data per-second, the journal directory should +contain only two or three journal files. + +To limit the size of journal files to 128 megabytes per file, use the +``--smallfiles`` command line option when starting :program:`mongod`. + +To speed the frequent sequential writes that occur to the current +journal file, you can ensure that the journal directory is on a +different system. However, doing so prevents use of a snapshotting +filesystem to take backups. + +Depending on your file system, you might experience a preallocation lag +the first time you start a :program:`mongod` instance with journaling +enabled. MongoDB preallocates journal files if it is faster on your file +system to predefine file size. Preallocation lag might last several +minutes, during which you will not be able to connect to the database. +This is a one-time preallocation and does not occur with future +invocations. + +To avoid preallocation lag, see +:ref:`journaling-avoid-preallocation-lag`. + +.. _journaling-storage-views: + +Storage Views used in Journaling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Journaling adds three storage views to MongoDB. + +The ``shared view`` stores modified data for upload to the MongoDB +data files. The ``shared view`` is the only view with direct access +to the MongoDB data files. When running with journaling, :program:`mongod` +asks the operating system to map your existing on-disk data files to the +``shared view`` memory view. The operating system maps the files but +does not load them. MongoDB later loads data files to ``shared view`` as +needed. + +The ``private view`` stores data for use in :doc:`read operations +`. The ``private view`` is mapped to the ``shared view`` +and is the first place MongoDB applies new :doc:`write operations +`. + +The journal is an on-disk view that stores new write operations +after they have been applied to the ``private cache`` but before they +have been applied to the data files. The journal provides durability. +If the :program:`mongod` instance were to crash without having applied +the writes to the data files, the journal could replay the writes to +the ``shared view`` for eventual upload to the data files. + +.. _journaling-record-write-operation: + +How Journaling Records Write Operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +MongoDB copies the write operations to the journal in batches called +group commits. By default, MongoDB performs a group commit every 100 +milliseconds, which means a series of operations over 100 milliseconds +are committed as a single batch. This is done to improve performance. + +Journaling stores raw operations that allow MongoDB to reconstruct the +following: + +- document insertion/updates +- index modifications +- changes to the namespace files + +As :doc:`write operations ` occur, MongoDB +writes the data to the ``private view`` in RAM and then copies the write +operations in batches to the journal. The journal stores the operations +on disk to ensure durability. MongoDB adds the operations as entries on +the journal's forward pointer. Each entry describes which bytes the +write operation changed in the data files. + +MongoDB next applies the journal's write operations to the ``shared +view``. At this point, the ``shared view`` becomes inconsistent with the +data files. + +At default intervals of 60 seconds, MongoDB asks the operating system to +flush the ``shared view`` to disk. This brings the data files up-to-date +with the latest write operations. + +When write operations are flushed to the data files, MongoDB removes the +write operations from the journal's behind pointer. The behind pointer +is always far back from advanced pointer. + +As part of journaling, MongoDB routinely asks the operating system to +remap the ``shared view`` to the ``private view``, for consistency. + +.. note:: The interaction between the ``shared view`` and the on-disk + data files is similar to how MongoDB works *without* + journaling, which is that MongoDB asks the operating system to flush + in-memory changes back to the data files every 60 seconds.