diff --git a/source/includes/steps-recover-data-files.yaml b/source/includes/steps-recover-data-files.yaml new file mode 100644 index 00000000000..6e97e0d90eb --- /dev/null +++ b/source/includes/steps-recover-data-files.yaml @@ -0,0 +1,98 @@ +ref: recover-data-files-run-mongodump-once-per-db +stepnum: 1 +title: Run ``mongodump`` once for each database to recover +action: + - pre: | + If the database used the :option:`--directoryperdb` option, run the + following command from the system shell prompt: + language: sh + code: | + mongodump --journal --dbpath /data/db --directoryperdb --repair -d users -o /data/recovery > /data/recovery/users.log + - pre: | + Otherwise omit the :option:`--directoryperdb` option: + language: sh + code: | + mongodump --journal --dbpath /data/db --repair -d users -o /data/recovery > /data/recovery/users.log +--- +ref: recover-data-files-verify-recovered-files-exist +stepnum: 2 +title: Verify the new files contain recovered documents +pre: | + Examine ``/data/recovery/users.log`` to determine how many documents + :program:`mongodump` recovered. +--- +ref: recover-data-files-create-new-mongodb-node +stepnum: 3 +title: Create MongoDB node with ``mongorestore`` +pre: | + In this example the new data directory is ``/data/db2``. +action: + language: sh + code: | + mongorestore --dbpath /data/db2 /data/recovery +--- +ref: recover-data-files-test-data-files +stepnum: 4 +title: Test the data files on a standalone ``mongod`` +action: + - pre: + language: sh + code: | + mongod --dbpath /data/recovery/ + - pre: | + If the repair has removed data, the number of documents in the + collection will be lower than it had been previously. From the + :program:`mongo` shell, verify the number of documents in each collection: + language: javascript + code: | + use users + db.collection.count() +post: | + Perform other application-specific tests in a staging environment as + needed. If the data files are correct, delete or archive the + ``/data/recovery`` directory, and *do not proceed with any further + recovery efforts*. +--- +ref: recover-data-files-use-repair-option-and-repairpath +stepnum: 5 +title: Use ``--repair`` and ``--repairpath`` +pre: | + If :program:`mongodump` failed to recover the data files, use + :program:`mongod` with the :option:`--repair ` and + :option:`--repairpath ` options to create a new + data directory with a repaired set of data files. Specify a new + directory to receive the repaired data files: +action: + language: sh + code: | + mongod --dbpath /data/db --repair --repairpath /data/recovery +post: | + When the :option:`--repair ` operation completes + successfully, the newly-repaired data files are in the new directory. + + .. warning:: + + :option:`--repair ` removes the invalid parts of + data files. *You can lose data as part of the recovery process.* + Under some circumstances, :option:`--repair ` + may remove the majority of data in the data file. Without the + :option:`--repairpath ` option, the new + data files permanently overwrite the old. +--- +ref: recover-data-files-test-data-files +stepnum: 6 +title: Test the data files +pre: | + Test the data files using the procedure outlined above. +--- +ref: recover-data-files-use-files-normally +stepnum: 7 +title: Use the recovered files normally +pre: | + Start :program:`mongod` with :setting:`dbpath` pointing to the new directory: +action: + language: sh + code: | + mongod --dbpath +... + diff --git a/source/includes/toc-administration-backup-and-recovery.yaml b/source/includes/toc-administration-backup-and-recovery.yaml index f9f8703c450..be0e869cf10 100644 --- a/source/includes/toc-administration-backup-and-recovery.yaml +++ b/source/includes/toc-administration-backup-and-recovery.yaml @@ -25,8 +25,8 @@ description: | Copy databases between :program:`mongod` instances or within a single :program:`mongod` instance or deployment. --- -file: /tutorial/recover-data-following-unexpected-shutdown +file: /tutorial/detect-invalid-data-files description: | Recover data from MongoDB data files that were not properly closed - or are in an inconsistent state. + or have an invalid state. ... diff --git a/source/includes/toc-spec-administration-tutorials-landing.yaml b/source/includes/toc-spec-administration-tutorials-landing.yaml index aad0763a08c..2ffc6a37574 100644 --- a/source/includes/toc-spec-administration-tutorials-landing.yaml +++ b/source/includes/toc-spec-administration-tutorials-landing.yaml @@ -16,7 +16,7 @@ files: level: 2 - file: /administration/backup-sharded-clusters level: 2 - - file: /tutorial/recover-data-following-unexpected-shutdown + - file: /tutorial/detect-invalid-data-files level: 2 - file: /administration/scripting level: 1 diff --git a/source/tutorial.txt b/source/tutorial.txt index 8e7c86b28c1..3c71bc7d5d5 100644 --- a/source/tutorial.txt +++ b/source/tutorial.txt @@ -57,7 +57,7 @@ Replica Sets - :doc:`/tutorial/configure-replica-set-tag-sets` - :doc:`/tutorial/manage-chained-replication` - :doc:`/tutorial/reconfigure-replica-set-with-unavailable-members` -- :doc:`/tutorial/recover-data-following-unexpected-shutdown` +- :doc:`/tutorial/detect-invalid-data-files` - :doc:`/tutorial/troubleshoot-replica-sets` Sharding @@ -85,7 +85,7 @@ Basic Operations ~~~~~~~~~~~~~~~~ - :doc:`/tutorial/use-database-commands` -- :doc:`/tutorial/recover-data-following-unexpected-shutdown` +- :doc:`/tutorial/detect-invalid-data-files` - :doc:`/tutorial/copy-databases-between-instances` - :doc:`/tutorial/expire-data` - :doc:`/tutorial/manage-the-database-profiler` @@ -104,7 +104,6 @@ Security - :doc:`/tutorial/add-user-administrator` - :doc:`/tutorial/add-user-to-database` - :doc:`/tutorial/define-roles` -- :doc:`/tutorial/change-user-privileges` - :doc:`/tutorial/view-roles` - :doc:`/tutorial/generate-key-file` - :doc:`/tutorial/control-access-to-mongodb-with-kerberos-authentication` diff --git a/source/tutorial/detect-invalid-data-files.txt b/source/tutorial/detect-invalid-data-files.txt new file mode 100644 index 00000000000..958ce3ba784 --- /dev/null +++ b/source/tutorial/detect-invalid-data-files.txt @@ -0,0 +1,113 @@ +========================== +Detect Invalid Data Files +========================== + +.. default-domain:: mongodb + +.. contents:: + :backlinks: none + :local: + +Overview +-------- + +Any deployment may suffer hardware failure, power failure, networking +failure, or some other interruption that may damage data files. MongoDB +provides a range of features, including :term:`replica sets ` and :term:`journaling `, to make recovery from those events +quick and complete. + +If you are *not* running a replica set, it may not be possible to +recover all the data stored in damaged data files. But even in this +case it is possible to remove the damaged portions of your data files, +and make them able to support application queries. + +If you are *not* running a replica set, and suspect that some of your +data files might be invalid, use the procedures described here and +in :doc:`/tutorial/recover-data` to help recover some of your data. + +The best way to avoid data loss and ensure the most +robust deployments is to follow the recommendations in +:doc:`maintain-valid-data-files`. + +See also +:doc:`/core/backups` and +:doc:`/administration/backup` for more information on preventing data loss. Also see +:doc:`/core/replication`, +:doc:`/core/journaling`, and +:doc:`/tutorial/manage-journaling`. + +Considerations +-------------- + +Data recovery on a single unjournaled :program:`mongod` instance +is more difficult than data recovery on a journaled replica set, +and may recover less data. + +Procedure +--------- + +Select the procedure that matches the :program:`mongod` configuration +that used the data files you want to recover: + +With no Journal Enabled +~~~~~~~~~~~~~~~~~~~~~~~ + +When a :program:`mongod` instance does not run with journaling enabled +and shuts down uncleanly, you must assume the data files are in an +invalid state. + +To confirm that a :program:`mongod` instance shut down uncleanly, look for the +following indicators: + +- a ``mongod.lock`` non-zero-length file in the data directory. + +- the following line in the :program:`mongod` log output: + + .. code-block:: none + + Unclean shutdown detected. + +With a Journal Enabled +~~~~~~~~~~~~~~~~~~~~~~ + +When a :program:`mongod` instance runs with journaling enabled +and shuts down uncleanly, or if you suspect invalid data +files, test the integrity of any single collection with the +:method:`db.collection.validate()` method. + +Test the integrity of the ``people`` collection using the following +command from the :program:`mongo` shell: + +.. code-block:: javascript + + db.test.validate(true) + +A portion of the output shows that the ``test`` collection is valid: + +.. code-block:: javascript + + { + ... + + "valid" : true, + "errors" : [ ], + "ok" : 1 + } + +If the collection is invalid, the output of +:method:`db.collection.validate()` shows that as well: + +.. code-block:: javascript + + { + ... + "valid" : false, + "errors" : [ + "invalid bson object detected (see logs for more info)", + "exception during validate" + ], + "advice" : "ns corrupt, requires repair", + "ok" : 1 + } + diff --git a/source/tutorial/maintain-valid-data-files.txt b/source/tutorial/maintain-valid-data-files.txt new file mode 100644 index 00000000000..35cbd0340ef --- /dev/null +++ b/source/tutorial/maintain-valid-data-files.txt @@ -0,0 +1,68 @@ +========================= +Maintain Valid Data Files +========================= + +.. default-domain:: mongodb + +.. contents:: + :backlinks: none + :local: + +Overview +-------- + +Any deployment may suffer hardware failure, power failure, networking +failure, or some other interruption that may damage data files. MongoDB +provides a range of features, including :term:`replica sets ` and :term:`journaling `, to make recovery from those events +quick and complete. + +Use the following recommendations to ensure that data is routinely +copied to multiple servers, and that damaged servers may recover +quickly. It's important to protect your data to enable recovery from +any unforseen event. + +See also +:doc:`/core/backups` and +:doc:`/administration/backup` for more information on preventing data loss. Also see +:doc:`/core/replication`, +:doc:`/core/journaling`, and +:doc:`/tutorial/manage-journaling` for information on how to set up +a robust deployment. + +Recommendations +--------------- + +Use Journaling +~~~~~~~~~~~~~~ + +Always use :ref:`durability journaling `. The journal +stores recent data changes, with the primary aim of recovering from +database invalidity. By default, MongoDB updates its journal ten +times per second. In the worst case, with journaling enabled, only +``1/10`` of a second of data may be lost. + +If a :program:`mongod` instance without journaling shuts down +unexpectedly for *any* reason, always assume that your database is +in an invalid state. + +Run all Deployments as Replica Sets +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Certain recovery options are much simpler if the :program:`mongod` +instance runs as a member of a replica set. The primary goal of +replica sets in MongoDB is to prevent data loss and ensure availability. In +the event of database invalidity, recovery may be as simple as syncing +from a fellow replica set member. + +Shut down Cleanly +~~~~~~~~~~~~~~~~~ + +A clean shutdown means that all ongoing MongoDB operations are +complete, and :program:`mongod` has flushed and closed all data files. + +An unclean shutdown can leave the database in an invalid state. + +To ensure a clean shutdown, use one of the shutdown procedures +described in :doc:`/tutorial/manage-mongodb-processes`. + diff --git a/source/tutorial/manage-journaling.txt b/source/tutorial/manage-journaling.txt index 179acfe4bae..19bae2a25b4 100644 --- a/source/tutorial/manage-journaling.txt +++ b/source/tutorial/manage-journaling.txt @@ -10,20 +10,15 @@ and to provide crash resiliency. Before applying a change to the data files, MongoDB writes the change operation to the journal. If MongoDB should terminate or encounter an error before it can write the changes from the journal to the data files, MongoDB can re-apply the write -operation and maintain a consistent state. +operation and maintain a valid state. -*Without* a journal, if :program:`mongod` exits unexpectedly, you must -assume your data is in an inconsistent state, and you must run either -:doc:`repair ` -or, preferably, :doc:`resync ` -from a clean member of the replica set. +Without a journal, if :program:`mongod` exits unexpectedly, you must +assume your data is in an invalid state, and follow the recommendations +in :doc:`/tutorial/detect-invalid-data-files`. -With journaling enabled, if :program:`mongod` stops unexpectedly, -the program can recover everything written to the journal, and the -data remains in a consistent state. By default, the greatest extent of lost -writes, i.e., those not made to the journal, are those made in the last -100 milliseconds. See :setting:`journalCommitInterval` for more -information on the default. +By default, the greatest extent of lost writes, i.e., those not made +to the journal, are those made in the last 100 milliseconds. See +:setting:`journalCommitInterval` for more information on the default. With journaling, if you want a data set to reside entirely in RAM, you need enough RAM to hold the data set plus the "write working set." The @@ -63,10 +58,10 @@ Disable Journaling Do not disable journaling on production systems. If your :program:`mongod` instance stops without shutting down cleanly - unexpectedly for any reason, (e.g. power failure) and you are + for any reason, (e.g. power failure) and you are not running with journaling, then you must recover from an unaffected :term:`replica set` member or backup, as described in - :doc:`repair `. + :doc:`/tutorial/resync-replica-set-member`. To disable journaling, start :program:`mongod` with the :option:`--nojournal ` command line option. diff --git a/source/tutorial/recover-data-following-unexpected-shutdown.txt b/source/tutorial/recover-data-following-unexpected-shutdown.txt deleted file mode 100644 index 1ef48007d2b..00000000000 --- a/source/tutorial/recover-data-following-unexpected-shutdown.txt +++ /dev/null @@ -1,196 +0,0 @@ -========================================= -Recover Data after an Unexpected Shutdown -========================================= - -.. default-domain:: mongodb - -If MongoDB does not shutdown cleanly [#clean-shutdown]_ the on-disk -representation of the data files will likely reflect an inconsistent -state which could lead to data corruption. [#validation]_ - -To prevent data inconsistency and corruption, always shut down the -database cleanly and use the :ref:`durability journaling -`. MongoDB writes data to the journal, by default, -every 100 milliseconds, such that MongoDB can always recover to a -consistent state even in the case of an unclean shutdown due to power -loss or other system failure. - -If you are *not* running as part of a :term:`replica set` **and** do -*not* have journaling enabled, use the following procedure to recover -data that may be in an inconsistent state. If you are running as part -of a replica set, you should *always* restore from a backup or restart -the :program:`mongod` instance with an empty :setting:`dbpath` and -allow MongoDB to perform an initial sync to restore the data. - -.. seealso:: The :doc:`/administration` documents, including - :ref:`Replica Set Syncing `, and the - documentation on the :setting:`repair`, :setting:`repairpath`, and - :setting:`journal` settings. - -.. [#clean-shutdown] To ensure a clean shut down, use the - :method:`db.shutdownServer()` from the :program:`mongo` shell, your - control script, the :option:`mongod --shutdown` option on Linux - systems, "Control-C" when running :program:`mongod` in interactive - mode, or ``kill $(pidof mongod)`` or ``kill -2 $(pidof mongod)``. - -.. [#validation] You can also use the :method:`db.collection.validate()` - method to test the integrity of a single collection. However, this - process is time consuming, and without journaling you can safely - assume that the data is in an invalid state and you should either - run the repair operation or resync from an intact member of the - replica set. - -Process -------- - -Indications -~~~~~~~~~~~ - -When you are aware of a :program:`mongod` instance running without -journaling that stops unexpectedly **and** you're not running with -replication, you should always run the repair operation before -starting MongoDB again. If you're using replication, then restore from -a backup and allow replication to perform an initial :ref:`sync ` to restore data. - -If the ``mongod.lock`` file in the data directory specified by -:setting:`dbpath`, ``/data/db`` by default, is *not* a zero-byte file, -then :program:`mongod` will refuse to start, and you will find a -message that contains the following line in your MongoDB log our -output: - -.. code-block:: none - - Unclean shutdown detected. - -This indicates that you need to run :program:`mongod` with the -:option:`--repair ` option. If you run repair when -the ``mongodb.lock`` file exists in your :setting:`dbpath`, or the -optional :option:`--repairpath `, you will see a -message that contains the following line: - -.. code-block:: none - - old lock file: /data/db/mongod.lock. probably means unclean shutdown - -If you see this message, as a last resort you may remove the lockfile -**and** run the repair operation before starting the database -normally, as in the following procedure: - -Overview -~~~~~~~~ - -.. warning:: Recovering a member of a replica set. - - Do not use this procedure to recover a member of a - :term:`replica set`. Instead you should either restore from - a :doc:`backup ` or perform an initial sync using - data from an intact member of the set, as described in - :doc:`/tutorial/resync-replica-set-member`. - -There are two processes to repair data files that result from an -unexpected shutdown: - -#. Use the :option:`--repair ` option in - conjunction with the :option:`--repairpath ` - option. :program:`mongod` will read the existing data files, and - write the existing data to new data files. This does not modify or - alter the existing data files. - - You do not need to remove the ``mongod.lock`` file before using - this procedure. - -#. Use the :option:`--repair ` option. - :program:`mongod` will read the existing data files, write the - existing data to new files and replace the existing, possibly - corrupt, files with new files. - - You must remove the ``mongod.lock`` file before using this - procedure. - -.. note:: - - :option:`--repair ` functionality is also - available in the shell with the :method:`db.repairDatabase()` - helper for the :dbcommand:`repairDatabase` command. - -.. _tutorial-repair-procedures: - -Procedures -~~~~~~~~~~ - -To repair your data files using the :option:`--repairpath ` -option to preserve the original data files unmodified: - -#. Start :program:`mongod` using :option:`--repair ` - to read the existing data files. - - .. code-block:: sh - - mongod --dbpath /data/db --repair --repairpath /data/db0 - - When this completes, the new repaired data files will be in the - ``/data/db0`` directory. - -#. Start :program:`mongod` using the following invocation to point the - :setting:`dbpath` at ``/data/db0``: - - .. code-block:: sh - - mongod --dbpath /data/db0 - - Once you confirm that the data files are operational you may delete - or archive the data files in the ``/data/db`` directory. - -To repair your data files without preserving the original files, do -not use the :option:`--repairpath ` option, as in -the following procedure: - -#. Remove the stale lock file: - - .. code-block:: sh - - rm /data/db/mongod.lock - - Replace ``/data/db`` with your :setting:`dbpath` where your MongoDB - instance's data files reside. - - .. warning:: - - After you remove the ``mongod.lock`` file you *must* run the - :option:`--repair ` process before using your - database. - -#. Start :program:`mongod` using :option:`--repair ` - to read the existing data files. - - .. code-block:: sh - - mongod --dbpath /data/db --repair - - When this completes, the repaired data files will replace the - original data files in the ``/data/db`` directory. - -#. Start :program:`mongod` using the following invocation to point the - :setting:`dbpath` at ``/data/db``: - - .. code-block:: sh - - mongod --dbpath /data/db - -``mongod.lock`` ---------------- - -In normal operation, you should **never** remove the ``mongod.lock`` -file and start :program:`mongod`. Instead consider the one of the above methods -to recover the database and remove the lock files. In dire -situations you can remove the lockfile, and start the database using the -possibly corrupt files, and attempt to recover data from the database; -however, it's impossible to predict the state of the database in these -situations. - -If you are not running with journaling, and your database shuts down -unexpectedly for *any* reason, you should always proceed *as if* your database -is in an inconsistent and likely corrupt state. If at all possible restore -from :doc:`backup ` or, if running as a :term:`replica -set`, restore by performing an initial sync using data from an intact -member of the set, as described in :doc:`/tutorial/resync-replica-set-member`. diff --git a/source/tutorial/recover-data.txt b/source/tutorial/recover-data.txt new file mode 100644 index 00000000000..ec4e5716440 --- /dev/null +++ b/source/tutorial/recover-data.txt @@ -0,0 +1,68 @@ +================== +Recover Data Files +================== + +.. default-domain:: mongodb + +Overview +-------- + +Any deployment may suffer hardware failure, power failure, networking +failure, or some other interruption that may damage data files. MongoDB +provides a range of features, including :term:`replica sets ` and :term:`journaling `, to make recovery from those events +quick and complete. + +If you suspect that some of your data files might be invalid, it may +be possible to recover some of the data. + +If your deployment *does* use :term:`replica sets `, use +the procedures described in :doc:`/tutorial/resync-replica-set-member` +to resync the invalid data files from valid files on one of the other +members of the replica set. + +If your deployment does *not* use replica sets, use the following +procedures to remove the invalid portions of your data files, and to +make the remaining portions usable. + +The best way to avoid data loss and ensure the most +robust deployments is to follow the recommendations in +:doc:`maintain-valid-data-files`. + +See also +:doc:`/core/backups` and +:doc:`/administration/backup` for more information on preventing data loss. Also see +:doc:`/core/replication`, +:doc:`/core/journaling`, and +:doc:`/tutorial/manage-journaling`. + +Considerations +-------------- + +.. warning:: + + This procedure removes the invalid parts of data files. *You can + lose data as part of the recovery process.* + + Using this procedure on a capped collection truncates the + collection to two documents. Do not use :program:`mongodump` with + the :option:`--repair ` to recover capped collections. + +- Use a recovery path with enough space to hold the recovered data files. + +- Do not remove the ``mongod.lock`` file from the data directory. + +- Do not perform this procedure on files that are currently opened + by a running :program:`mongod` instance. This procedure will not + work on files that are already open. + +Procedure +--------- + +This procedure assumes a ``/data/db`` directory that contains data +files for a ``users`` database, and a ``/data/recovery`` directory +for recovered files. Use the following sequence of operations with +the relevant directory and database names. + +.. include:: /includes/steps/recover-data-files.rst +