diff --git a/source/includes/steps-recover-data-files.yaml b/source/includes/steps-recover-data-files.yaml new file mode 100644 index 00000000000..a4b2772a845 --- /dev/null +++ b/source/includes/steps-recover-data-files.yaml @@ -0,0 +1,101 @@ +ref: run-mongodump +stepnum: 1 +title: Run :program:`mongodump` for each database to recover. +action: + - pre: | + If the database used the :option:`--directoryperdb` option, run the + following command from the system shell prompt: + language: sh + code: | + mongodump --journal --dbpath /data/db --directoryperdb --repair -d users -o /data/recovery > /data/recovery/users.log + - pre: | + Otherwise omit the :option:`--directoryperdb` option: + language: sh + code: | + mongodump --journal --dbpath /data/db --repair -d users -o /data/recovery > /data/recovery/users.log +--- +ref: verify +stepnum: 2 +title: Verify the new files contain recovered documents. +pre: | + Examine ``/data/recovery/users.log`` to determine how many documents + :program:`mongodump` recovered. +--- +ref: create-new-mongodb-node +stepnum: 3 +title: Create a new MongoDB node. +pre: | + Use :program:`mongorestore` to create a new data directory. For + example, the following command restores data to the new data directory + ``/data/db2``. +action: + language: sh + code: | + mongorestore --dbpath /data/db2 /data/recovery +--- +ref: test +stepnum: 4 +title: Test the data files on a standalone ``mongod``. +action: + - pre: | + Start the :program:`mongod` with with a ``dbpath`` pointing to the + recovered data. For example: + language: sh + code: | + mongod --dbpath /data/recovery/ + - pre: | + If the repair has removed data, the number of documents in the + collection will be lower than it had been previously. From the + :program:`mongo` shell, verify the number of documents in each collection: + language: javascript + code: | + use users + db.collection.count() +post: | + Perform other application-specific tests in a staging environment as + needed. If the data files are correct, delete or archive the + ``/data/recovery`` directory and *do not proceed with any further + recovery efforts*. +--- +ref: repair +stepnum: 5 +title: Repair the data. +pre: | + If :program:`mongodump` failed to recover the data files, use + :program:`mongod` with the :option:`--repair ` and + :option:`--repairpath ` options to create a new + data directory with a repaired set of data files. Specify a new + directory to receive the repaired data files: +action: + language: sh + code: | + mongod --dbpath /data/db --repair --repairpath /data/recovery +post: | + When the :option:`--repair ` operation completes + successfully, the newly-repaired data files are in the new directory. + + .. warning:: + + :option:`--repair ` removes the invalid parts of + data files. You can lose data as part of the recovery process. + Under some circumstances, :option:`--repair ` + may remove the majority of data in the data file. Without the + :option:`--repairpath ` option, the new + data files permanently overwrite the old. +--- +ref: test-data-files +stepnum: 6 +title: Test the data files. +pre: | + Test the data files using the procedure outlined above. +--- +ref: use-files-normally +stepnum: 7 +title: Use the recovered files normally. +pre: | + Start :program:`mongod` with :setting:`dbpath` pointing to the new directory: +action: + language: sh + code: | + mongod --dbpath +... diff --git a/source/includes/toc-administration-backup-and-recovery.yaml b/source/includes/toc-administration-backup-and-recovery.yaml index 3fe4f78a95d..b5d40b790a1 100644 --- a/source/includes/toc-administration-backup-and-recovery.yaml +++ b/source/includes/toc-administration-backup-and-recovery.yaml @@ -20,7 +20,16 @@ description: | Detailed procedures and considerations for backing up sharded clusters and single shards. --- -file: /tutorial/recover-data-following-unexpected-shutdown +file: /tutorial/maintain-valid-data-files +description: | + Ensure valid data through journaling and replica sets. +--- +file: /tutorial/detect-invalid-data-files +description: | + Recover data from MongoDB data files that were not properly closed + or have an invalid state. +--- +file: /tutorial/recover-data description: | Recover data from MongoDB data files that were not properly closed or have an invalid state. diff --git a/source/includes/toc-spec-administration-tutorials-landing.yaml b/source/includes/toc-spec-administration-tutorials-landing.yaml index 5e7fb535b81..a5363186cda 100644 --- a/source/includes/toc-spec-administration-tutorials-landing.yaml +++ b/source/includes/toc-spec-administration-tutorials-landing.yaml @@ -18,7 +18,7 @@ files: level: 2 - file: /administration/backup-sharded-clusters level: 2 - - file: /tutorial/recover-data-following-unexpected-shutdown + - file: /tutorial/detect-invalid-data-files level: 2 - text: "Continue reading from :doc:`/administration/backup` for additional tutorials of MongoDB backup and recovery procedures." level: 2 diff --git a/source/tutorial.txt b/source/tutorial.txt index 8596751c80d..bc76f246629 100644 --- a/source/tutorial.txt +++ b/source/tutorial.txt @@ -57,7 +57,7 @@ Replica Sets - :doc:`/tutorial/configure-replica-set-tag-sets` - :doc:`/tutorial/manage-chained-replication` - :doc:`/tutorial/reconfigure-replica-set-with-unavailable-members` -- :doc:`/tutorial/recover-data-following-unexpected-shutdown` +- :doc:`/tutorial/detect-invalid-data-files` - :doc:`/tutorial/troubleshoot-replica-sets` Sharding @@ -85,7 +85,8 @@ Basic Operations ~~~~~~~~~~~~~~~~ - :doc:`/tutorial/use-database-commands` -- :doc:`/tutorial/recover-data-following-unexpected-shutdown` +- :doc:`/tutorial/detect-invalid-data-files` +- :doc:`/tutorial/copy-databases-between-instances` - :doc:`/tutorial/expire-data` - :doc:`/tutorial/manage-the-database-profiler` - :doc:`/tutorial/rotate-log-files` @@ -103,7 +104,6 @@ Security - :doc:`/tutorial/add-user-administrator` - :doc:`/tutorial/add-user-to-database` - :doc:`/tutorial/define-roles` -- :doc:`/tutorial/change-user-privileges` - :doc:`/tutorial/view-roles` - :doc:`/tutorial/generate-key-file` - :doc:`/tutorial/control-access-to-mongodb-with-kerberos-authentication` diff --git a/source/tutorial/detect-invalid-data-files.txt b/source/tutorial/detect-invalid-data-files.txt new file mode 100644 index 00000000000..a4dcd45d603 --- /dev/null +++ b/source/tutorial/detect-invalid-data-files.txt @@ -0,0 +1,100 @@ +========================== +Detect Invalid Data Files +========================== + +.. default-domain:: mongodb + +.. contents:: + :backlinks: none + :local: + +Overview +-------- + +If you have an interruption, such as a power failure, you must assume your +data files are in an invalid state. This section describes how to check +whether a :program:`mongod` instance shut down cleanly and how to test +data integrity. + +To recover data that has been corrupted, see +:doc:`/tutorial/recover-data`. + +.. note:: The best way to avoid data loss and ensure the most robust + deployments is to follow the recommendations in + :doc:`maintain-valid-data-files`. + +Procedures +---------- + +Detect an Unclean Shutdown +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To detect whether a :program:`mongod` instance shutdown cleanly, look for +the following indicators: + +- a ``mongod.lock`` non-zero-length file in the data directory. + +- the following line in the :program:`mongod` log output: + + .. code-block:: none + + Unclean shutdown detected. + +These indicate an unclean shutdown, in which case you must assume data +files are invalid. + +Test Data Integrity +~~~~~~~~~~~~~~~~~~~ + +This procedure applies only if the :program:`mongod` instance runs with +:term:`journaling ` enabled. To test data integrity, use either +the :method:`db.collection.validate()` method or :dbcommand:`validate` +command. + +For example, to test the integrity of the ``people`` collection, use the +following command from the :program:`mongo` shell: + +.. code-block:: javascript + + db.test.validate(true) + +A portion of the output shows that the ``test`` collection is valid: + +.. code-block:: javascript + + { + ... + + "valid" : true, + "errors" : [ ], + "ok" : 1 + } + +If the collection is invalid, the output of +:method:`db.collection.validate()` shows that as well: + +.. code-block:: javascript + + { + ... + "valid" : false, + "errors" : [ + "invalid bson object detected (see logs for more info)", + "exception during validate" + ], + "advice" : "ns corrupt, requires repair", + "ok" : 1 + } + +Related Documents +----------------- + +- :doc:`maintain-valid-data-files` + +- :doc:`/tutorial/recover-data` + +- :doc:`/tutorial/manage-journaling` + +- :doc:`/core/backups` + +- :doc:`/administration/backup` diff --git a/source/tutorial/maintain-valid-data-files.txt b/source/tutorial/maintain-valid-data-files.txt new file mode 100644 index 00000000000..afa9c8b92d1 --- /dev/null +++ b/source/tutorial/maintain-valid-data-files.txt @@ -0,0 +1,66 @@ +========================= +Maintain Valid Data Files +========================= + +.. default-domain:: mongodb + +.. contents:: + :backlinks: none + :local: + +Overview +-------- + +MongoDB provides features to protect your data in the event of hardware +failure, power failure, network failure, or other unforeseen events that +affect data. Use the features described here to ensure that data is +routinely copied to multiple servers and that damaged servers recover +quickly. + +Enable Journaling +----------------- + +Always use :ref:`durability journaling `. The journal +stores recent data changes, with the primary aim of recovering from +database invalidity. By default, MongoDB updates its journal ten +times per second. In the worst case, with journaling enabled, +``1/10`` of a second of data may be lost. + +If a :program:`mongod` instance without journaling shuts down +unexpectedly for *any* reason, always assume that your database is +in an invalid state. + +For 64-bit builds of :program:`mongod`, MongoDB enables journaling by +default. For more information see :doc:`/core/journaling` and +:doc:`/tutorial/manage-journaling`. + +Run All Deployments as Replica Sets +----------------------------------- + +Certain recovery options are much simpler if the :program:`mongod` +instance runs as a member of a replica set. The primary goal of +replica sets in MongoDB is to ensure availability and prevent data loss. In +the event of database invalidity, recovery may be as simple as syncing +from a fellow replica set member. + +For more information see :doc:`/core/replication`. + +Shut Down Cleanly +----------------- + +A clean shutdown means that all ongoing MongoDB operations are +complete and :program:`mongod` has flushed and closed all data files. +An unclean shutdown, however, can leave the database in an invalid state. +To ensure a clean shutdown, use one of the shutdown procedures +described in :doc:`/tutorial/manage-mongodb-processes`. + +Related Documents +----------------- + +- :doc:`detect-invalid-data-files` + +- :doc:`/tutorial/recover-data` + +- :doc:`/core/backups` + +- :doc:`/administration/backup` diff --git a/source/tutorial/manage-journaling.txt b/source/tutorial/manage-journaling.txt index b900d291b7b..dfc7ce4b000 100644 --- a/source/tutorial/manage-journaling.txt +++ b/source/tutorial/manage-journaling.txt @@ -4,38 +4,36 @@ Manage Journaling .. default-domain:: mongodb +Overview +-------- + MongoDB uses *write ahead logging* to an on-disk :term:`journal` to guarantee :doc:`write operation ` durability and to provide crash resiliency. Before applying a change to the data -files, MongoDB writes the change operation to the journal. If MongoDB -should terminate or encounter an error before it can write the changes -from the journal to the data files, MongoDB can re-apply the write -operation and maintain a consistent state. - -*Without* a journal, if :program:`mongod` exits unexpectedly, you must -assume your data is in an inconsistent state, and you must run either -:doc:`repair ` -or, preferably, :doc:`resync ` -from a clean member of the replica set. - -With journaling enabled, if :program:`mongod` stops unexpectedly, -the program can recover everything written to the journal, and the -data remains in a consistent state. By default, the greatest extent of lost -writes, i.e., those not made to the journal, are those made in the last -100 milliseconds. See :setting:`journalCommitInterval` for more -information on the default. +files, MongoDB writes the change operation to the journal. If a +:program:`mongod` should terminate or encounter an error before it can +write the changes from the journal to the data files, MongoDB can re-apply +the write operation and maintain a valid state. + +Without a journal, if a :program:`mongod` exits unexpectedly, you must +assume your data is in an invalid state and follow a detection procedure +in :doc:`/tutorial/detect-invalid-data-files`. + +By default, the greatest extent of lost writes, i.e., those not made +to the journal, are those made in the last 100 milliseconds. See +:setting:`journalCommitInterval` for more information on the default. + +Considerations +-------------- With journaling, if you want a data set to reside entirely in RAM, you -need enough RAM to hold the data set plus the "write working set." The -"write working set" is the amount of unique data you expect to see +must have enough RAM to hold the data set plus the *write working set*, +which is the amount of unique data you expect to see written between re-mappings of the private view. For information on views, see :ref:`journaling-storage-views`. -.. important:: - - .. versionchanged:: 2.0 - For 64-bit builds of :program:`mongod`, journaling is enabled by - default. For other platforms, see :setting:`journal`. +For 64-bit builds of :program:`mongod`, journaling is enabled by default. +For other platforms, see :setting:`journal`. Procedures ---------- @@ -43,9 +41,6 @@ Procedures Enable Journaling ~~~~~~~~~~~~~~~~~ -.. versionchanged:: 2.0 - For 64-bit builds of :program:`mongod`, journaling is enabled by default. - To enable journaling, start :program:`mongod` with the :option:`--journal ` command line option. @@ -61,20 +56,21 @@ Disable Journaling .. warning:: - Do not disable journaling on production systems. If your - :program:`mongod` instance stops without shutting down cleanly - unexpectedly for any reason, (e.g. power failure) and you are - not running with journaling, then you must recover from an - unaffected :term:`replica set` member or backup, as described in - :doc:`repair `. + Do not disable journaling on production systems. -To disable journaling, start :program:`mongod` with the -:option:`--nojournal ` command line option. +If you disable journaling and your :program:`mongod` instance stops +without shutting down cleanly, for example from a power failure, then you +must recover from an unaffected :term:`replica set` member or backup, as +described in :doc:`/tutorial/resync-replica-set-member`. + +Disable journaling only on non-production systems. To do so, start +:program:`mongod` with the :option:`--nojournal ` +command line option. Get Commit Acknowledgment ~~~~~~~~~~~~~~~~~~~~~~~~~ -You can get commit acknowledgment with the +To get commit acknowledgment, use the :dbcommand:`getLastError` command and the ``j`` option. For details, see :ref:`write-concern-operation`. @@ -83,18 +79,17 @@ You can get commit acknowledgment with the Avoid Preallocation Lag ~~~~~~~~~~~~~~~~~~~~~~~ -To avoid :ref:`preallocation lag `, you can +To avoid :ref:`preallocation lag `, preallocate files in the journal directory by copying them from another instance of :program:`mongod`. -Preallocated files do not contain data. It is safe to later remove them. -But if you restart :program:`mongod` with journaling, :program:`mongod` -will create them again. +Preallocated files do not contain data, and it is safe to later remove them. +However, if you restart :program:`mongod` with journaling, MongoDB +will again create them. .. example:: The following sequence preallocates journal files for an instance of :program:`mongod` running on port ``27017`` with a database path of ``/data/db``. - For demonstration purposes, the sequence starts by creating a set of journal files in the usual way. @@ -105,14 +100,14 @@ will create them again. mkdir ~/tmpDbpath - #. Create a set of journal files by staring a :program:`mongod` + #. Create a set of journal files by starting a :program:`mongod` instance that uses the temporary directory: .. code-block:: sh mongod --port 10000 --dbpath ~/tmpDbpath --journal - #. When you see the following log output, indicating + #. When you see the following log output, which indicates :program:`mongod` has the files, press CONTROL+C to stop the :program:`mongod` instance: @@ -167,7 +162,7 @@ Change the Group Commit Interval .. versionchanged:: 2.0 -You can set the group commit interval using the +To set the group commit interval, use the :option:`--journalCommitInterval ` command line option. The allowed range is ``2`` to ``300`` milliseconds. diff --git a/source/tutorial/recover-data-following-unexpected-shutdown.txt b/source/tutorial/recover-data-following-unexpected-shutdown.txt deleted file mode 100644 index 1ef48007d2b..00000000000 --- a/source/tutorial/recover-data-following-unexpected-shutdown.txt +++ /dev/null @@ -1,196 +0,0 @@ -========================================= -Recover Data after an Unexpected Shutdown -========================================= - -.. default-domain:: mongodb - -If MongoDB does not shutdown cleanly [#clean-shutdown]_ the on-disk -representation of the data files will likely reflect an inconsistent -state which could lead to data corruption. [#validation]_ - -To prevent data inconsistency and corruption, always shut down the -database cleanly and use the :ref:`durability journaling -`. MongoDB writes data to the journal, by default, -every 100 milliseconds, such that MongoDB can always recover to a -consistent state even in the case of an unclean shutdown due to power -loss or other system failure. - -If you are *not* running as part of a :term:`replica set` **and** do -*not* have journaling enabled, use the following procedure to recover -data that may be in an inconsistent state. If you are running as part -of a replica set, you should *always* restore from a backup or restart -the :program:`mongod` instance with an empty :setting:`dbpath` and -allow MongoDB to perform an initial sync to restore the data. - -.. seealso:: The :doc:`/administration` documents, including - :ref:`Replica Set Syncing `, and the - documentation on the :setting:`repair`, :setting:`repairpath`, and - :setting:`journal` settings. - -.. [#clean-shutdown] To ensure a clean shut down, use the - :method:`db.shutdownServer()` from the :program:`mongo` shell, your - control script, the :option:`mongod --shutdown` option on Linux - systems, "Control-C" when running :program:`mongod` in interactive - mode, or ``kill $(pidof mongod)`` or ``kill -2 $(pidof mongod)``. - -.. [#validation] You can also use the :method:`db.collection.validate()` - method to test the integrity of a single collection. However, this - process is time consuming, and without journaling you can safely - assume that the data is in an invalid state and you should either - run the repair operation or resync from an intact member of the - replica set. - -Process -------- - -Indications -~~~~~~~~~~~ - -When you are aware of a :program:`mongod` instance running without -journaling that stops unexpectedly **and** you're not running with -replication, you should always run the repair operation before -starting MongoDB again. If you're using replication, then restore from -a backup and allow replication to perform an initial :ref:`sync ` to restore data. - -If the ``mongod.lock`` file in the data directory specified by -:setting:`dbpath`, ``/data/db`` by default, is *not* a zero-byte file, -then :program:`mongod` will refuse to start, and you will find a -message that contains the following line in your MongoDB log our -output: - -.. code-block:: none - - Unclean shutdown detected. - -This indicates that you need to run :program:`mongod` with the -:option:`--repair ` option. If you run repair when -the ``mongodb.lock`` file exists in your :setting:`dbpath`, or the -optional :option:`--repairpath `, you will see a -message that contains the following line: - -.. code-block:: none - - old lock file: /data/db/mongod.lock. probably means unclean shutdown - -If you see this message, as a last resort you may remove the lockfile -**and** run the repair operation before starting the database -normally, as in the following procedure: - -Overview -~~~~~~~~ - -.. warning:: Recovering a member of a replica set. - - Do not use this procedure to recover a member of a - :term:`replica set`. Instead you should either restore from - a :doc:`backup ` or perform an initial sync using - data from an intact member of the set, as described in - :doc:`/tutorial/resync-replica-set-member`. - -There are two processes to repair data files that result from an -unexpected shutdown: - -#. Use the :option:`--repair ` option in - conjunction with the :option:`--repairpath ` - option. :program:`mongod` will read the existing data files, and - write the existing data to new data files. This does not modify or - alter the existing data files. - - You do not need to remove the ``mongod.lock`` file before using - this procedure. - -#. Use the :option:`--repair ` option. - :program:`mongod` will read the existing data files, write the - existing data to new files and replace the existing, possibly - corrupt, files with new files. - - You must remove the ``mongod.lock`` file before using this - procedure. - -.. note:: - - :option:`--repair ` functionality is also - available in the shell with the :method:`db.repairDatabase()` - helper for the :dbcommand:`repairDatabase` command. - -.. _tutorial-repair-procedures: - -Procedures -~~~~~~~~~~ - -To repair your data files using the :option:`--repairpath ` -option to preserve the original data files unmodified: - -#. Start :program:`mongod` using :option:`--repair ` - to read the existing data files. - - .. code-block:: sh - - mongod --dbpath /data/db --repair --repairpath /data/db0 - - When this completes, the new repaired data files will be in the - ``/data/db0`` directory. - -#. Start :program:`mongod` using the following invocation to point the - :setting:`dbpath` at ``/data/db0``: - - .. code-block:: sh - - mongod --dbpath /data/db0 - - Once you confirm that the data files are operational you may delete - or archive the data files in the ``/data/db`` directory. - -To repair your data files without preserving the original files, do -not use the :option:`--repairpath ` option, as in -the following procedure: - -#. Remove the stale lock file: - - .. code-block:: sh - - rm /data/db/mongod.lock - - Replace ``/data/db`` with your :setting:`dbpath` where your MongoDB - instance's data files reside. - - .. warning:: - - After you remove the ``mongod.lock`` file you *must* run the - :option:`--repair ` process before using your - database. - -#. Start :program:`mongod` using :option:`--repair ` - to read the existing data files. - - .. code-block:: sh - - mongod --dbpath /data/db --repair - - When this completes, the repaired data files will replace the - original data files in the ``/data/db`` directory. - -#. Start :program:`mongod` using the following invocation to point the - :setting:`dbpath` at ``/data/db``: - - .. code-block:: sh - - mongod --dbpath /data/db - -``mongod.lock`` ---------------- - -In normal operation, you should **never** remove the ``mongod.lock`` -file and start :program:`mongod`. Instead consider the one of the above methods -to recover the database and remove the lock files. In dire -situations you can remove the lockfile, and start the database using the -possibly corrupt files, and attempt to recover data from the database; -however, it's impossible to predict the state of the database in these -situations. - -If you are not running with journaling, and your database shuts down -unexpectedly for *any* reason, you should always proceed *as if* your database -is in an inconsistent and likely corrupt state. If at all possible restore -from :doc:`backup ` or, if running as a :term:`replica -set`, restore by performing an initial sync using data from an intact -member of the set, as described in :doc:`/tutorial/resync-replica-set-member`. diff --git a/source/tutorial/recover-data.txt b/source/tutorial/recover-data.txt new file mode 100644 index 00000000000..d834e8297f8 --- /dev/null +++ b/source/tutorial/recover-data.txt @@ -0,0 +1,65 @@ +================== +Recover Data Files +================== + +.. default-domain:: mongodb + +Overview +-------- + +If you suspect that some of your data files are invalid and if your +deployment uses :term:`replica sets `, use the procedures in +:doc:`/tutorial/resync-replica-set-member` to resync the invalid data +files from valid files on one of the other members of the replica set. + +If your deployment does *not* use replica sets, use the following +procedure to remove the invalid portions of your data files and to make +the remaining portions usable. + +The best way to avoid data loss and ensure the most robust deployments is +to follow the recommendations in :doc:`maintain-valid-data-files`. + +Considerations +-------------- + +.. warning:: + + You can lose data as part of the recovery process. + +This procedure removes the invalid parts of data files. On a capped +collection, this procedure truncates the collection to two documents. Do +not use :program:`mongodump` with the :option:`--repair ` to +recover capped collections. + +Do not perform this procedure on files that are currently opened by a +running :program:`mongod` instance. This procedure will not work on files +that are already open. + +Prerequisite +------------ + +Use a recovery path with enough space to hold the recovered data files. + +Procedure +--------- + +This procedure assumes a ``/data/db`` directory that contains data files +for a ``users`` database and a ``/data/recovery`` directory for recovered +files. Use the following sequence of operations with the relevant +directory and database names. Never remove the ``mongod.lock`` file from +the data directory. + +.. include:: /includes/steps/recover-data-files.rst + +Related Documents +----------------- + +- :doc:`maintain-valid-data-files` + +- :doc:`detect-invalid-data-files` + +- :doc:`/tutorial/manage-journaling` + +- :doc:`/core/backups` + +- :doc:`/administration/backup`