|
| 1 | +========== |
| 2 | +Journaling |
| 3 | +========== |
| 4 | + |
| 5 | +.. default-domain:: mongodb |
| 6 | + |
| 7 | +MongoDB uses *write ahead logging* or :term:`journaling <journal>` to |
| 8 | +guarantee :doc:`write operation </core/write-operations>` durability |
| 9 | +by way of a an on disk journal. Before applying a change to the data |
| 10 | +files, MongoDB writes this operation to the journal. Then, if MongoDB |
| 11 | +terminates or encounters an error unexpectedly before it can write the |
| 12 | +data to disk, MongoDB can re-apply the write operation and maintain a |
| 13 | +consistent state. |
| 14 | + |
| 15 | +Journaling ensures that :program:`mongodb` is crash resilient. *Without* |
| 16 | +a journal, if :program:`mongodb` exits unexpectedly, you must assume |
| 17 | +your data is in an inconsistent state and must either run |
| 18 | +:doc:`repair </tutorial/recover-data-following-unexpected-shutdown>` |
| 19 | +or preferably :ref:`resync <replica-set-resync-stale-member>` from a |
| 20 | +clean member of the replica set. |
| 21 | + |
| 22 | +When journaling is enabled, if :program:`mongodb` stops unexpectedly, |
| 23 | +the program can recover everything written to the journal, and the |
| 24 | +data is in a consistent state. By default, the greatest extent of lost |
| 25 | +writes, i.e., those not made to the journal, is no more than the last |
| 26 | +100 milliseconds. |
| 27 | + |
| 28 | +With journaling, if you want a data set to reside entirely in RAM, you |
| 29 | +need enough RAM to hold the dataset plus the "write working set." The |
| 30 | +"write working set" is the amount of unique data you expect to see |
| 31 | +written between re-mappings of the private view. For information on |
| 32 | +views, see :ref:`journaling-storage-views`. |
| 33 | + |
| 34 | +.. important:: |
| 35 | + |
| 36 | + .. versionchanged:: 2.0 |
| 37 | + For 64-bit builds of :program:`mongod`, journaling is enabled by default. |
| 38 | + For other platforms, see :setting:`journal`. |
| 39 | + |
| 40 | +Procedures |
| 41 | +---------- |
| 42 | + |
| 43 | +Enable Journaling |
| 44 | +~~~~~~~~~~~~~~~~~ |
| 45 | + |
| 46 | +.. versionchanged:: 2.0 |
| 47 | + For 64-bit builds of :program:`mongod`, journaling is enabled by default. |
| 48 | + |
| 49 | +To enable journaling, start :program:`mongod` with the |
| 50 | +:option:`--journal` command line option. |
| 51 | + |
| 52 | +If no journal files exist, when :program:`mongod` starts, it must |
| 53 | +preallocates new journal files. During this operation, the |
| 54 | +:program:`mongod` is not listening for connections until preallocation |
| 55 | +completes: for some systems this may take a several minutes. During |
| 56 | +this period your applications and the :program:`mongo` shell are not |
| 57 | +available. |
| 58 | + |
| 59 | +Disable Journaling |
| 60 | +~~~~~~~~~~~~~~~~~~ |
| 61 | + |
| 62 | +.. warning:: |
| 63 | + |
| 64 | + Do not disable journaling on production systems. If your |
| 65 | + :program:`mongod` instance stops without shutting down cleanly |
| 66 | + unexpectedly for any reason, (e.g. power failure) and you are |
| 67 | + not running with journaling, then you must recover from an |
| 68 | + unaffected :term:`replica set` member or backup, as described in |
| 69 | + :doc:`repair </tutorial/recover-data-following-unexpected-shutdown>`. |
| 70 | + |
| 71 | +To disable journaling, start :program:`mongod` with the |
| 72 | +:option:`--nojournal <mongod --nojournal>` command line option. |
| 73 | + |
| 74 | +To disable journaling, shut down :program:`mongod` cleanly and restart |
| 75 | +with :option:`--nojournal <mongod --nojournal>`. |
| 76 | + |
| 77 | +Get Commit Acknowledgment |
| 78 | +~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 79 | + |
| 80 | +You can get commit acknowledgement with the |
| 81 | +:dbcommand:`getLastError` command and the ``j`` option. For details, see |
| 82 | +:ref:`write-concern-operation`. |
| 83 | + |
| 84 | +.. _journaling-avoid-preallocation-lag: |
| 85 | + |
| 86 | +Avoid Preallocation Lag |
| 87 | +~~~~~~~~~~~~~~~~~~~~~~~ |
| 88 | + |
| 89 | +To avoid :ref:`preallocation lag <journaling-journal-files>`, you can |
| 90 | +preallocate files in the journal directory by copying them from another |
| 91 | +instance of :program:`mongod`. |
| 92 | + |
| 93 | +Preallocated files do not contain data. It is safe to later remove them. |
| 94 | +But if you restart :program:`mongod` with journaling, :program:`mongod` |
| 95 | +will create them again. |
| 96 | + |
| 97 | +.. example:: The following sequence preallocates journal files for an |
| 98 | + instance of :program:`mongod` running on port ``27017`` with a database |
| 99 | + path of ``/data/db``. |
| 100 | + |
| 101 | + For demonstration purposes, the sequence starts by creating a set of |
| 102 | + journal files in the usual way. |
| 103 | + |
| 104 | + 1. Create a temporary directory into which to create a set of journal |
| 105 | + files: |
| 106 | + |
| 107 | + .. code-block:: sh |
| 108 | + |
| 109 | + mkdir ~/tmpDbpath |
| 110 | + |
| 111 | + #. Create a set of journal files by staring a :program:`mongod` |
| 112 | + instance that uses the temporary directory: |
| 113 | + |
| 114 | + .. code-block:: sh |
| 115 | + |
| 116 | + mongod --port 10000 --dbpath ~/tmpDbpath --journal |
| 117 | + |
| 118 | + #. When you see the following log output, indicating :program:`mongod` has the files, |
| 119 | + press CONTROL+C to stop the :program:`mongod` instance: |
| 120 | + |
| 121 | + .. code-block:: sh |
| 122 | + |
| 123 | + web admin interface listening on port 11000 |
| 124 | + |
| 125 | + #. Preallocate journal files for the new instance of |
| 126 | + :program:`mongod` by moving the journal files from the data directory |
| 127 | + of the existing instance to the data directory of the new instance: |
| 128 | + |
| 129 | + .. code-block:: sh |
| 130 | + |
| 131 | + mv ~/tmpDbpath/journal /data/db/ |
| 132 | + |
| 133 | + #. Start the new :program:`mongod` instance: |
| 134 | + |
| 135 | + .. code-block:: sh |
| 136 | + |
| 137 | + mongod --port 27017 --dbpath /data/db --journal |
| 138 | + |
| 139 | +Monitor Journal Status |
| 140 | +~~~~~~~~~~~~~~~~~~~~~~ |
| 141 | + |
| 142 | +Use the following commands and methods to monitor journal status: |
| 143 | + |
| 144 | +- :dbcommand:`serverStatus` |
| 145 | + |
| 146 | + The :dbcommand:`serverStatus` command returns database status |
| 147 | + information that is useful for assessing performance. |
| 148 | + |
| 149 | +- :dbcommand:`journalLatencyTest` |
| 150 | + |
| 151 | + Use :dbcommand:`journalLatencyTest` to measure how long it takes on |
| 152 | + your volume to write to the disk in an append-only fashion. You can |
| 153 | + run this command on an idle system to get a baseline sync time for |
| 154 | + journaling. You can also run this command on a busy system to see the |
| 155 | + sync time on a busy system, which may be higher if the journal |
| 156 | + directory is on the same volume as the data files. |
| 157 | + |
| 158 | + The :dbcommand:`journalLatencyTest` command also provides a way to |
| 159 | + check if your disk drive is buffering writes in its local cache. If |
| 160 | + the number is very low (i.e., less than 2 milliseconds) and the drive |
| 161 | + is non-SSD, the drive is probably buffering writes. In that case, |
| 162 | + enable cache write-through for the device in your operating system, |
| 163 | + unless you have a disk controller card with battery backed RAM. |
| 164 | + |
| 165 | +.. _journaling-journal-commit-interval: |
| 166 | + |
| 167 | +Change the Group Commit Interval |
| 168 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 169 | + |
| 170 | +.. versionchanged:: 2.0 |
| 171 | + |
| 172 | +You can set the group commit interval using the |
| 173 | +:option:`--journalCommitInterval <mongod --journalCommitInterval>` |
| 174 | +command line option. The allowed range is ``2`` to ``300`` milliseconds. |
| 175 | + |
| 176 | +Lower values increase the durability of the journal at the expense of |
| 177 | +disk performance. |
| 178 | + |
| 179 | +Recover Data After Unexpected Shutdown |
| 180 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 181 | + |
| 182 | +On a restart after a crash, MongoDB replays all journal files in the |
| 183 | +journal directory before the server becomes available. If MongoDB must |
| 184 | +replay journal files, :program:`mongod` notes these events in the log |
| 185 | +output. |
| 186 | + |
| 187 | +There is no reason to run :dbcommand:`repair` in these situations. |
| 188 | + |
| 189 | +.. _journaling-internals: |
| 190 | + |
| 191 | +Journaling Internals |
| 192 | +-------------------- |
| 193 | + |
| 194 | +When running with journaling, MongoDB stores and applies :doc:`write |
| 195 | +operations </core/write-operations>` in memory and in the journal before |
| 196 | +the changes are in the data files. |
| 197 | + |
| 198 | +.. _journaling-journal-files: |
| 199 | + |
| 200 | +Journal Files |
| 201 | +~~~~~~~~~~~~~ |
| 202 | + |
| 203 | +With journaling enabled, MongoDB creates a journal directory within |
| 204 | +the directory defined by :setting:`dbpath`, which is :file:`/data/db` |
| 205 | +by default. The journal directory holds journal files, which contain |
| 206 | +write-ahead redo logs. The directory also holds a last-sequence-number |
| 207 | +file. A clean shutdown removes all the files in the journal directory. |
| 208 | + |
| 209 | +Journal files are append-only files and have file names prefixed with |
| 210 | +``j._``. When a journal file holds 1 gigabyte of data, MongoDB creates |
| 211 | +a new journal file. Once MongoDB applies all the write operations in |
| 212 | +the journal files, it deletes these files. Unless you |
| 213 | +write *many* bytes of data per-second, the journal directory should |
| 214 | +contain only two or three journal files. |
| 215 | + |
| 216 | +To limit the size of each journal file to 128 megabytes, use the |
| 217 | +:setting`smallfiles` run time option when starting :program:`mongod`. |
| 218 | + |
| 219 | +To speed the frequent sequential writes that occur to the current |
| 220 | +journal file, you can ensure that the journal directory is on a |
| 221 | +different system. |
| 222 | + |
| 223 | +.. important:: |
| 224 | + |
| 225 | + If you place the journal on a different filesystem from your data |
| 226 | + files you *cannot* use a filesystem snapshot to capture consistent |
| 227 | + backups of a :setting:`dbpath` directory. |
| 228 | + |
| 229 | +.. note:: |
| 230 | + |
| 231 | + Depending on your file system, you might experience a preallocation |
| 232 | + lag the first time you start a :program:`mongod` instance with |
| 233 | + journaling enabled. MongoDB preallocates journal files if it is |
| 234 | + faster on your file system to create files of a |
| 235 | + pre-defined. The amount of time required to pre-allocate |
| 236 | + lag might last several minutes, during which you will not be able |
| 237 | + to connect to the database. This is a one-time preallocation and |
| 238 | + does not occur with future invocations. |
| 239 | + |
| 240 | +To avoid preallocation lag, see :ref:`journaling-avoid-preallocation-lag`. |
| 241 | + |
| 242 | +.. _journaling-storage-views: |
| 243 | + |
| 244 | +Storage Views used in Journaling |
| 245 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 246 | + |
| 247 | +Journaling adds three storage views to MongoDB. |
| 248 | + |
| 249 | +The ``shared view`` stores modified data for upload to the MongoDB |
| 250 | +data files. The ``shared view`` is the only view with direct access |
| 251 | +to the MongoDB data files. When running with journaling, :program:`mongod` |
| 252 | +asks the operating system to map your existing on-disk data files to the |
| 253 | +``shared view`` memory view. The operating system maps the files but |
| 254 | +does not load them. MongoDB later loads data files to ``shared view`` as |
| 255 | +needed. |
| 256 | + |
| 257 | +The ``private view`` stores data for use in :doc:`read operations |
| 258 | +</core/read-operations>`. MongoDB maps ``private view`` to the ``shared view`` |
| 259 | +and is the first place MongoDB applies new :doc:`write operations |
| 260 | +</core/write-operations>`. |
| 261 | + |
| 262 | +The journal is an on-disk view that stores new write operations |
| 263 | +after MongoDB applies the operation to the ``private cache`` but |
| 264 | +before applying them to the data files. The journal provides durability. |
| 265 | +If the :program:`mongod` instance were to crash without having applied |
| 266 | +the writes to the data files, the journal could replay the writes to |
| 267 | +the ``shared view`` for eventual upload to the data files. |
| 268 | + |
| 269 | +.. _journaling-record-write-operation: |
| 270 | + |
| 271 | +How Journaling Records Write Operations |
| 272 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 273 | + |
| 274 | +MongoDB copies the write operations to the journal in batches called |
| 275 | +group commits. By default, MongoDB performs a group commit every 100 |
| 276 | +milliseconds: as a result MongoDB commits all operations within a 100 |
| 277 | +millisecond window in a single batch. These "group commits" help |
| 278 | +minimize the performance impact of journaling. |
| 279 | + |
| 280 | +Journaling stores raw operations that allow MongoDB to reconstruct the |
| 281 | +following: |
| 282 | + |
| 283 | +- document insertion/updates |
| 284 | +- index modifications |
| 285 | +- changes to the namespace files |
| 286 | + |
| 287 | +As :doc:`write operations </core/write-operations>` occur, MongoDB |
| 288 | +writes the data to the ``private view`` in RAM and then copies the write |
| 289 | +operations in batches to the journal. The journal stores the operations |
| 290 | +on disk to ensure durability. MongoDB adds the operations as entries on |
| 291 | +the journal's forward pointer. Each entry describes which bytes the |
| 292 | +write operation changed in the data files. |
| 293 | + |
| 294 | +MongoDB next applies the journal's write operations to the ``shared |
| 295 | +view``. At this point, the ``shared view`` becomes inconsistent with the |
| 296 | +data files. |
| 297 | + |
| 298 | +At default intervals of 60 seconds, MongoDB asks the operating system to |
| 299 | +flush the ``shared view`` to disk. This brings the data files up-to-date |
| 300 | +with the latest write operations. |
| 301 | + |
| 302 | +When MongoDB flushes write operations to the data files, MongoDB removes the |
| 303 | +write operations from the journal's behind pointer. The behind pointer |
| 304 | +is always far back from advanced pointer. |
| 305 | + |
| 306 | +As part of journaling, MongoDB routinely asks the operating system to |
| 307 | +remap the ``shared view`` to the ``private view``, for consistency. |
| 308 | + |
| 309 | +.. note:: |
| 310 | + |
| 311 | + The interaction between the ``shared view`` and the on-disk |
| 312 | + data files is similar to how MongoDB works *without* |
| 313 | + journaling, which is that MongoDB asks the operating system to flush |
| 314 | + in-memory changes back to the data files every 60 seconds. |
0 commit comments