From b1f48fd3c2018f2a256bb579ef2a91df941a5c00 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Mon, 22 Oct 2012 12:46:00 -0400 Subject: [PATCH 1/2] DOCS-449: resync a stale member --- source/administration/replica-sets.txt | 88 ++++++++++++++++++- source/core/replication-internals.txt | 26 +++--- source/core/replication.txt | 10 ++- source/replication.txt | 1 + ...e-replica-set-with-unavailable-members.txt | 9 +- ...ver-data-following-unexpected-shutdown.txt | 17 ++-- 6 files changed, 120 insertions(+), 31 deletions(-) diff --git a/source/administration/replica-sets.txt b/source/administration/replica-sets.txt index 5a4ca794750..f713259c33a 100644 --- a/source/administration/replica-sets.txt +++ b/source/administration/replica-sets.txt @@ -33,6 +33,7 @@ suggestions for administers of replica sets. - :doc:`/tutorial/change-hostnames-in-a-replica-set` - :doc:`/tutorial/convert-secondary-into-arbiter` - :doc:`/tutorial/reconfigure-replica-set-with-unavailable-members` + - :doc:`/tutorial/recover-data-following-unexpected-shutdown` .. _replica-set-node-configurations: .. _replica-set-member-configurations: @@ -365,7 +366,8 @@ the following to prepare the new member's :term:`data directory `: difference in the amount of time between the most recent operation and the most recent operation to the database exceeds the length of the :term:`oplog` on the existing members, then the new instance will have - to completely re-synchronize. + to completely resynchronize, as described in + :ref:`replica-set-resync-stale-member`. Use :method:`db.printReplicationInfo()` to check the current state of replica set members with regards to the oplog. @@ -558,6 +560,90 @@ the oplog. For a detailed procedure, see .. include:: /includes/procedure-change-oplog-size.rst +.. _replica-set-resync-stale-member: + +Resyncing a Member of a Replica Set +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a member's data falls too far behind the :term:`oplog` to catch up, +the member and it's data are considered "stale". A member's data is too +far behind when the oplog on the :term:`primary` has overwritten its +entries before the member has copied them. When that occurs, you must +resync the member by removing its data and replacing it with up-to-date +data. + +To do so, use one of the following approaches: + +- Restart the machine with an empty data directory and let MongoDB's + automatic syncing feature restore the data. This approach requires + fewer steps but can take longer to replace the data. + + See :ref:`replica-set-auto-resync-stale-member`. + +- Restart the machine with a copy of a recent data directory from + another member in the :term:`replica set`. This procedure can replace + the data more quickly but requires more manual steps. + + See :ref:`replica-set-resync-by-copying`. + +.. index:: replica set; resync +.. _replica-set-auto-resync-stale-member: + +Automatically Resync a Stale Member +``````````````````````````````````` + +This procedure relies on MongoDB's automatic syncing feature to restore +the data on the stale member. For an overview of how MongoDB syncs +replica sets, see :ref:`replica-set-syncing`. + +To resync the stale member: + +1. Stop the member's :program:`mongod` instance using the + :option:`mongod --shutdown` option. Make sure to set + :option:`--dbpath ` to the member's data directory. + + .. code-block:: sh + + mongod --dbpath /data/db/ --shutdown + +#. Delete all data and subdirectories from the member's data directory + such that the directory is empty. + +#. Restart the :program:`mongod` instance on the member. Consider the + following example: + + .. code-block:: sh + + mongod --dbpath /data/db/ --replSet rsProduction + + MongoDB resyncs the member. Resyncing may take a long time, depending on + the size of the database and speed of the network. + +.. index:: replica set; resync +.. _replica-set-resync-by-copying: + +Resync by Copying Data from Another Member +`````````````````````````````````````````` + +This approach uses the data directory of an existing member to "seed" +the stale member. The data must be recent enough to allow the new member +to catch up with the :term:`primary` member's :term:`oplog`. + +To resync by copying data from another member, use one of the following +approaches: + +- Create a snapshot of another member's data and then restore that + snapshot to the stale member. Use the snapshot procedures in + :doc:`/administration/backups`. + +- Lock another member's database with the :method:`db.fsyncLock()` + command, copy that data, and then restore the data to the stale + member. Use the procedures for backup storage in + :doc:`/administration/backups`. + +- Use the :dbcommand:`copydb` and :dbcommand:`clone` commands, as + described in :doc:`/tutorial/copy-databases-between-instances`. + .. _replica-set-security: Replica Set Security diff --git a/source/core/replication-internals.txt b/source/core/replication-internals.txt index 7a624c2af0f..af174131f5d 100644 --- a/source/core/replication-internals.txt +++ b/source/core/replication-internals.txt @@ -4,9 +4,6 @@ Replication Internals .. default-domain:: mongodb -Synopsis --------- - This document provides a more in-depth explanation of the internals and operation of :term:`replica set` features. This material is not necessary for normal operation or application development but may be useful for @@ -77,11 +74,10 @@ the following collections: .. _replica-set-oplog: .. _replica-set-internals-oplog: -Oplog ------ +Oplog Internals +--------------- -For an explanation of the oplog, see the :ref:`replica-set-oplog-sizing` -topic in the :doc:`/core/replication` document. +For an explanation of the oplog, see :ref:`replica-set-oplog-sizing`. Under various exceptional situations, updates to a :term:`secondary's ` oplog might @@ -113,8 +109,8 @@ Data Integrity .. index:: replica set; read preferences -Read Preferences -~~~~~~~~~~~~~~~~ +Read Preference Internals +~~~~~~~~~~~~~~~~~~~~~~~~~ MongoDB uses :term:`single-master replication` to ensure that the database remains consistent. However, clients may modify the @@ -172,8 +168,8 @@ for your data set is crucial. .. index:: replica set; security -Security --------- +Security Internals +------------------ Administrators of replica sets also have unique :ref:`monitoring ` and :ref:`security ` @@ -188,8 +184,8 @@ modify the configuration of an existing replica set. .. index:: replica set; failover .. _replica-set-election-internals: -Elections ---------- +Election Internals +------------------ Elections are the process :term:`replica set` members use to select which member should become :term:`primary`. A primary is the only member in the replica @@ -297,6 +293,8 @@ and a majority of servers in one data center and one server in another. .. index:: replica set; sync +.. _replica-set-syncing: + Syncing ------- @@ -327,3 +325,5 @@ For example: alternate facility, and if you add another secondary to the alternate facility, the new secondary will likely sync from the existing secondary because it is closer than the primary. + +.. seealso:: :ref:`replica-set-resync-stale-member` diff --git a/source/core/replication.txt b/source/core/replication.txt index 32300a4a739..878ecb4748d 100644 --- a/source/core/replication.txt +++ b/source/core/replication.txt @@ -353,9 +353,13 @@ activity of your MongoDB-based application are reads and you are writing a small amount of data, you may find that you need a much smaller oplog. -For a further understanding of oplog behavior, see the -:ref:`replica-set-oplog` topic in the :doc:`/core/replication-internals` -document. +To view oplog status, including the size and the time range of +operations, issue the :method:`db.printReplicationInfo()` method. For +more information on oplog status, see +:ref:`replica-set-troubleshooting-check-oplog-size`. + +For an advanced understanding of oplog behavior, see +ref:`replica-set-oplog` and :ref:`replica-set-syncing`. Replica Set Deployment ~~~~~~~~~~~~~~~~~~~~~~ diff --git a/source/replication.txt b/source/replication.txt index 8a9962f7cea..e2565b9f01f 100644 --- a/source/replication.txt +++ b/source/replication.txt @@ -56,6 +56,7 @@ operations in detail: tutorial/change-hostnames-in-a-replica-set tutorial/convert-secondary-into-arbiter tutorial/reconfigure-replica-set-with-unavailable-members + tutorial/recover-data-following-unexpected-shutdown .. _replication-reference: diff --git a/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt b/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt index 9aa44974d86..2801552f2fe 100644 --- a/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt +++ b/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt @@ -1,6 +1,6 @@ -================================================== -Reconfigure a Replica Set with Unavailable Members -================================================== +=============================================== +Reconfigure a Replica Set when Members are Down +=============================================== .. default-domain:: mongodb @@ -23,9 +23,6 @@ members can reach a majority. See :ref:`replica-set-elections-and-network-partitions` for more information on this situation. -This document provides the following options for reconfiguring a replica -set when a **majority** of members are accessible: - .. index:: replica set; reconfiguration .. _replica-set-force-reconfiguration: diff --git a/source/tutorial/recover-data-following-unexpected-shutdown.txt b/source/tutorial/recover-data-following-unexpected-shutdown.txt index c40db97a63e..4ac24cf21e4 100644 --- a/source/tutorial/recover-data-following-unexpected-shutdown.txt +++ b/source/tutorial/recover-data-following-unexpected-shutdown.txt @@ -9,21 +9,22 @@ representation of the data files will likely reflect an inconsistent state which could lead to data corruption. To prevent data inconsistency and corruption, always shut down the -database cleanly, and use the :ref:`durability journaling +database cleanly and use the :ref:`durability journaling `. The journal writes data to disk every 100 -milliseconds by default, and ensures that MongoDB will be able to +milliseconds by default and ensures that MongoDB can recover to a consistent state even in the case of an unclean shutdown due to power loss or other system failure. If you are *not* running as part of a :term:`replica set` **and** do -*not* have journaling enabled use the following procedure to recover +*not* have journaling enabled, use the following procedure to recover data that may be in an inconsistent state. If you are running as part of a replica set, you should *always* restore from a backup or restart the :program:`mongod` instance with an empty :setting:`dbpath` and allow MongoDB to resync the data. -.. seealso:: The ":doc:`/administration`" documents and the - documentation of the :setting:`repair`, :setting:`repairpath`, and +.. seealso:: The :doc:`/administration` documents, including + :ref:`replica-set-syncing`, and the + documentation on the :setting:`repair`, :setting:`repairpath`, and :setting:`journal` settings. .. [#clean-shutdown] To ensure a clean shut down, use the @@ -41,7 +42,7 @@ When you are aware of a :program:`mongod` instance running without journaling that stops unexpectedly **and** you're not running with replication, you should always run the repair operation before starting MongoDB again. If you're using replication, then restore from -a backup and allow replication to synchronize your data. +a backup and allow replication to :ref:`synchronize ` your data. If the ``mongod.lock`` file in the data directory specified by :setting:`dbpath`, ``/data/db`` by default, is *not* a zero-byte file, @@ -72,7 +73,7 @@ Overview Do not use this procedure to recover a member of a :term:`replica set`. Instead you should either restore from a :doc:`backup ` - or re-sync from an intact member of the set. + or resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`. There are two processes to repair data files that result from an unexpected shutdown: @@ -171,4 +172,4 @@ If you are not running with journaling, and your database shuts down unexpectedly for *any* reason, you should always proceed *as if* your database is in an inconsistent and likely corrupt state. If at all possible restore from :doc:`backup ` or if running as a :term:`replica -set` re-sync from an intact member of the set. +set` resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`. From 44a6425d6f1d06b80b3e451589d1865db6ddbc45 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Mon, 22 Oct 2012 14:31:15 -0400 Subject: [PATCH 2/2] DOCS-449 review edits --- source/administration/replica-sets.txt | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/source/administration/replica-sets.txt b/source/administration/replica-sets.txt index f713259c33a..4b892cb749b 100644 --- a/source/administration/replica-sets.txt +++ b/source/administration/replica-sets.txt @@ -574,7 +574,7 @@ data. To do so, use one of the following approaches: -- Restart the machine with an empty data directory and let MongoDB's +- Restart the :program:`mongod` with an empty data directory and let MongoDB's automatic syncing feature restore the data. This approach requires fewer steps but can take longer to replace the data. @@ -594,7 +594,7 @@ Automatically Resync a Stale Member This procedure relies on MongoDB's automatic syncing feature to restore the data on the stale member. For an overview of how MongoDB syncs -replica sets, see :ref:`replica-set-syncing`. +:term:`replica sets `, see :ref:`replica-set-syncing`. To resync the stale member: @@ -617,7 +617,9 @@ To resync the stale member: mongod --dbpath /data/db/ --replSet rsProduction MongoDB resyncs the member. Resyncing may take a long time, depending on - the size of the database and speed of the network. + the size of the database and speed of the network. Also, + this puts a load on the member being synced from. That + member might not be able to keep a working set in memory. .. index:: replica set; resync .. _replica-set-resync-by-copying: @@ -627,7 +629,7 @@ Resync by Copying Data from Another Member This approach uses the data directory of an existing member to "seed" the stale member. The data must be recent enough to allow the new member -to catch up with the :term:`primary` member's :term:`oplog`. +to catch up with the :term:`oplog`. To resync by copying data from another member, use one of the following approaches: @@ -636,14 +638,11 @@ approaches: snapshot to the stale member. Use the snapshot procedures in :doc:`/administration/backups`. -- Lock another member's database with the :method:`db.fsyncLock()` - command, copy that data, and then restore the data to the stale +- Lock another member's data with the :method:`db.fsyncLock()` + command, copy all of the data in the data directory, and then restore the data to the stale member. Use the procedures for backup storage in :doc:`/administration/backups`. -- Use the :dbcommand:`copydb` and :dbcommand:`clone` commands, as - described in :doc:`/tutorial/copy-databases-between-instances`. - .. _replica-set-security: Replica Set Security