From b1f48fd3c2018f2a256bb579ef2a91df941a5c00 Mon Sep 17 00:00:00 2001
From: Bob Grabar <bob.grabar@10gen.com>
Date: Mon, 22 Oct 2012 12:46:00 -0400
Subject: [PATCH 1/2] DOCS-449: resync a stale member

---
 source/administration/replica-sets.txt        | 88 ++++++++++++++++++-
 source/core/replication-internals.txt         | 26 +++---
 source/core/replication.txt                   | 10 ++-
 source/replication.txt                        |  1 +
 ...e-replica-set-with-unavailable-members.txt |  9 +-
 ...ver-data-following-unexpected-shutdown.txt | 17 ++--
 6 files changed, 120 insertions(+), 31 deletions(-)
diff --git a/source/administration/replica-sets.txt b/source/administration/replica-sets.txt
index 5a4ca794750..f713259c33a 100644
--- a/source/administration/replica-sets.txt
+++ b/source/administration/replica-sets.txt
@@ -33,6 +33,7 @@ suggestions for administers of replica sets.
    - :doc:`/tutorial/change-hostnames-in-a-replica-set`
    - :doc:`/tutorial/convert-secondary-into-arbiter`
    - :doc:`/tutorial/reconfigure-replica-set-with-unavailable-members`
+   - :doc:`/tutorial/recover-data-following-unexpected-shutdown`
 
 .. _replica-set-node-configurations:
 .. _replica-set-member-configurations:
@@ -365,7 +366,8 @@ the following to prepare the new member's :term:`data directory <dbpath>`:
   difference in the amount of time between the most recent operation and
   the most recent operation to the database exceeds the length of the
   :term:`oplog` on the existing members, then the new instance will have
-  to completely re-synchronize.
+  to completely resynchronize, as described in
+  :ref:`replica-set-resync-stale-member`.
 
    Use :method:`db.printReplicationInfo()` to check the current state of
    replica set members with regards to the oplog.
@@ -558,6 +560,90 @@ the oplog. For a detailed procedure, see
 
 .. include:: /includes/procedure-change-oplog-size.rst
 
+.. _replica-set-resync-stale-member:
+
+Resyncing a Member of a Replica Set
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a member's data falls too far behind the :term:`oplog` to catch up,
+the member and it's data are considered "stale". A member's data is too
+far behind when the oplog on the :term:`primary` has overwritten its
+entries before the member has copied them. When that occurs, you must
+resync the member by removing its data and replacing it with up-to-date
+data.
+
+To do so, use one of the following approaches:
+
+- Restart the machine with an empty data directory and let MongoDB's
+  automatic syncing feature restore the data. This approach requires
+  fewer steps but can take longer to replace the data.
+
+  See :ref:`replica-set-auto-resync-stale-member`.
+
+- Restart the machine with a copy of a recent data directory from
+  another member in the :term:`replica set`. This procedure can replace
+  the data more quickly but requires more manual steps.
+
+  See :ref:`replica-set-resync-by-copying`.
+
+.. index:: replica set; resync
+.. _replica-set-auto-resync-stale-member:
+
+Automatically Resync a Stale Member
+```````````````````````````````````
+
+This procedure relies on MongoDB's automatic syncing feature to restore
+the data on the stale member. For an overview of how MongoDB syncs
+replica sets, see :ref:`replica-set-syncing`.
+
+To resync the stale member:
+
+1. Stop the member's :program:`mongod` instance using the
+   :option:`mongod --shutdown` option. Make sure to set
+   :option:`--dbpath <mongod --dbpath>` to the member's data directory.
+
+   .. code-block:: sh
+
+      mongod --dbpath /data/db/ --shutdown
+
+#. Delete all data and subdirectories from the member's data directory
+   such that the directory is empty.
+
+#. Restart the :program:`mongod` instance on the member. Consider the
+   following example:
+
+   .. code-block:: sh
+
+      mongod --dbpath /data/db/ --replSet rsProduction
+
+   MongoDB resyncs the member. Resyncing may take a long time, depending on
+   the size of the database and speed of the network.
+
+.. index:: replica set; resync
+.. _replica-set-resync-by-copying:
+
+Resync by Copying Data from Another Member
+``````````````````````````````````````````
+
+This approach uses the data directory of an existing member to "seed"
+the stale member. The data must be recent enough to allow the new member
+to catch up with the :term:`primary` member's :term:`oplog`.
+
+To resync by copying data from another member, use one of the following
+approaches:
+
+- Create a snapshot of another member's data and then restore that
+  snapshot to the stale member. Use the snapshot procedures in
+  :doc:`/administration/backups`.
+
+- Lock another member's database with the :method:`db.fsyncLock()`
+  command, copy that data, and then restore the data to the stale
+  member. Use the procedures for backup storage in
+  :doc:`/administration/backups`.
+
+- Use the :dbcommand:`copydb` and :dbcommand:`clone` commands, as
+  described in :doc:`/tutorial/copy-databases-between-instances`.
+
 .. _replica-set-security:
 
 Replica Set Security
diff --git a/source/core/replication-internals.txt b/source/core/replication-internals.txt
index 7a624c2af0f..af174131f5d 100644
--- a/source/core/replication-internals.txt
+++ b/source/core/replication-internals.txt
@@ -4,9 +4,6 @@ Replication Internals
 
 .. default-domain:: mongodb
 
-Synopsis
---------
-
 This document provides a more in-depth explanation of the internals and
 operation of :term:`replica set` features. This material is not necessary for
 normal operation or application development but may be useful for
@@ -77,11 +74,10 @@ the following collections:
 .. _replica-set-oplog:
 .. _replica-set-internals-oplog:
 
-Oplog
------
+Oplog Internals
+---------------
 
-For an explanation of the oplog, see the :ref:`replica-set-oplog-sizing`
-topic in the :doc:`/core/replication` document.
+For an explanation of the oplog, see :ref:`replica-set-oplog-sizing`.
 
 Under various exceptional
 situations, updates to a :term:`secondary's <secondary>` oplog might
@@ -113,8 +109,8 @@ Data Integrity
 
 .. index:: replica set; read preferences
 
-Read Preferences
-~~~~~~~~~~~~~~~~
+Read Preference Internals
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
 MongoDB uses :term:`single-master replication` to ensure that the
 database remains consistent. However, clients may modify the
@@ -172,8 +168,8 @@ for your data set is crucial.
 
 .. index:: replica set; security
 
-Security
---------
+Security Internals
+------------------
 
 Administrators of replica sets also have unique :ref:`monitoring
 <replica-set-monitoring>` and :ref:`security <replica-set-security>`
@@ -188,8 +184,8 @@ modify the configuration of an existing replica set.
 .. index:: replica set; failover
 .. _replica-set-election-internals:
 
-Elections
----------
+Election Internals
+------------------
 
 Elections are the process :term:`replica set` members use to select which member should
 become :term:`primary`. A primary is the only member in the replica
@@ -297,6 +293,8 @@ and a majority of servers in one data center and one server in another.
 
 .. index:: replica set; sync
 
+.. _replica-set-syncing:
+
 Syncing
 -------
 
@@ -327,3 +325,5 @@ For example:
    alternate facility, and if you add another secondary to the alternate
    facility, the new secondary will likely sync from the existing
    secondary because it is closer than the primary.
+
+.. seealso:: :ref:`replica-set-resync-stale-member`
diff --git a/source/core/replication.txt b/source/core/replication.txt
index 32300a4a739..878ecb4748d 100644
--- a/source/core/replication.txt
+++ b/source/core/replication.txt
@@ -353,9 +353,13 @@ activity of your MongoDB-based application are reads and you are
 writing a small amount of data, you may find that you need a much
 smaller oplog.
 
-For a further understanding of oplog behavior, see the
-:ref:`replica-set-oplog` topic in the :doc:`/core/replication-internals`
-document.
+To view oplog status, including the size and the time range of
+operations, issue the :method:`db.printReplicationInfo()` method. For
+more information on oplog status, see
+:ref:`replica-set-troubleshooting-check-oplog-size`.
+
+For an advanced understanding of oplog behavior, see
+ref:`replica-set-oplog` and :ref:`replica-set-syncing`.
 
 Replica Set Deployment
 ~~~~~~~~~~~~~~~~~~~~~~
diff --git a/source/replication.txt b/source/replication.txt
index 8a9962f7cea..e2565b9f01f 100644
--- a/source/replication.txt
+++ b/source/replication.txt
@@ -56,6 +56,7 @@ operations in detail:
    tutorial/change-hostnames-in-a-replica-set
    tutorial/convert-secondary-into-arbiter
    tutorial/reconfigure-replica-set-with-unavailable-members
+   tutorial/recover-data-following-unexpected-shutdown
 
 .. _replication-reference:
 
diff --git a/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt b/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt
index 9aa44974d86..2801552f2fe 100644
--- a/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt
+++ b/source/tutorial/reconfigure-replica-set-with-unavailable-members.txt
@@ -1,6 +1,6 @@
-==================================================
-Reconfigure a Replica Set with Unavailable Members
-==================================================
+===============================================
+Reconfigure a Replica Set when Members are Down
+===============================================
 
 .. default-domain:: mongodb
 
@@ -23,9 +23,6 @@ members can reach a majority. See
 :ref:`replica-set-elections-and-network-partitions` for more
 information on this situation.
 
-This document provides the following options for reconfiguring a replica
-set when a **majority** of members are accessible:
-
 .. index:: replica set; reconfiguration
 .. _replica-set-force-reconfiguration:
 
diff --git a/source/tutorial/recover-data-following-unexpected-shutdown.txt b/source/tutorial/recover-data-following-unexpected-shutdown.txt
index c40db97a63e..4ac24cf21e4 100644
--- a/source/tutorial/recover-data-following-unexpected-shutdown.txt
+++ b/source/tutorial/recover-data-following-unexpected-shutdown.txt
@@ -9,21 +9,22 @@ representation of the data files will likely reflect an inconsistent
 state which could lead to data corruption.
 
 To prevent data inconsistency and corruption, always shut down the
-database cleanly, and use the :ref:`durability journaling
+database cleanly and use the :ref:`durability journaling
 <setting-journal>`. The journal writes data to disk every 100
-milliseconds by default, and ensures that MongoDB will be able to
+milliseconds by default and ensures that MongoDB can
 recover to a consistent state even in the case of an unclean shutdown due to
 power loss or other system failure.
 
 If you are *not* running as part of a :term:`replica set` **and** do
-*not* have journaling enabled use the following procedure to recover
+*not* have journaling enabled, use the following procedure to recover
 data that may be in an inconsistent state. If you are running as part
 of a replica set, you should *always* restore from a backup or restart
 the :program:`mongod` instance with an empty :setting:`dbpath` and
 allow MongoDB to resync the data.
 
-.. seealso:: The ":doc:`/administration`" documents and the
-   documentation of the :setting:`repair`, :setting:`repairpath`, and
+.. seealso:: The :doc:`/administration` documents, including
+   :ref:`replica-set-syncing`, and the
+   documentation on the :setting:`repair`, :setting:`repairpath`, and
    :setting:`journal` settings.
 
 .. [#clean-shutdown] To ensure a clean shut down, use the
@@ -41,7 +42,7 @@ When you are aware of a :program:`mongod` instance running without
 journaling that stops unexpectedly **and** you're not running with
 replication, you should always run the repair operation before
 starting MongoDB again. If you're using replication, then restore from
-a backup and allow replication to synchronize your data.
+a backup and allow replication to :ref:`synchronize <replica-set-syncing>` your data.
 
 If the ``mongod.lock`` file in the data directory specified by
 :setting:`dbpath`, ``/data/db`` by default, is *not* a zero-byte file,
@@ -72,7 +73,7 @@ Overview
 
    Do not use this procedure to recover a member of a :term:`replica set`.
    Instead you should either restore from a :doc:`backup </administration/backups>` 
-   or re-sync from an intact member of the set.
+   or resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`.
 
 There are two processes to repair data files that result from an
 unexpected shutdown:
@@ -171,4 +172,4 @@ If you are not running with journaling, and your database shuts down
 unexpectedly for *any* reason, you should always proceed *as if* your database
 is in an inconsistent and likely corrupt state. If at all possible restore
 from :doc:`backup </administration/backups>` or if running as a :term:`replica
-set` re-sync from an intact member of the set.
+set` resync from an intact member of the set, as described in :ref:`replica-set-resync-stale-member`.

From 44a6425d6f1d06b80b3e451589d1865db6ddbc45 Mon Sep 17 00:00:00 2001
From: Bob Grabar <bob.grabar@10gen.com>
Date: Mon, 22 Oct 2012 14:31:15 -0400
Subject: [PATCH 2/2] DOCS-449 review edits

---
 source/administration/replica-sets.txt | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/source/administration/replica-sets.txt b/source/administration/replica-sets.txt
index f713259c33a..4b892cb749b 100644
--- a/source/administration/replica-sets.txt
+++ b/source/administration/replica-sets.txt
@@ -574,7 +574,7 @@ data.
 
 To do so, use one of the following approaches:
 
-- Restart the machine with an empty data directory and let MongoDB's
+- Restart the :program:`mongod` with an empty data directory and let MongoDB's
   automatic syncing feature restore the data. This approach requires
   fewer steps but can take longer to replace the data.
 
@@ -594,7 +594,7 @@ Automatically Resync a Stale Member
 
 This procedure relies on MongoDB's automatic syncing feature to restore
 the data on the stale member. For an overview of how MongoDB syncs
-replica sets, see :ref:`replica-set-syncing`.
+:term:`replica sets <replica set>`, see :ref:`replica-set-syncing`.
 
 To resync the stale member:
 
@@ -617,7 +617,9 @@ To resync the stale member:
       mongod --dbpath /data/db/ --replSet rsProduction
 
    MongoDB resyncs the member. Resyncing may take a long time, depending on
-   the size of the database and speed of the network.
+   the size of the database and speed of the network. Also,
+   this puts a load on the member being synced from. That
+   member might not be able to keep a working set in memory.
 
 .. index:: replica set; resync
 .. _replica-set-resync-by-copying:
@@ -627,7 +629,7 @@ Resync by Copying Data from Another Member
 
 This approach uses the data directory of an existing member to "seed"
 the stale member. The data must be recent enough to allow the new member
-to catch up with the :term:`primary` member's :term:`oplog`.
+to catch up with the :term:`oplog`.
 
 To resync by copying data from another member, use one of the following
 approaches:
@@ -636,14 +638,11 @@ approaches:
   snapshot to the stale member. Use the snapshot procedures in
   :doc:`/administration/backups`.
 
-- Lock another member's database with the :method:`db.fsyncLock()`
-  command, copy that data, and then restore the data to the stale
+- Lock another member's data with the :method:`db.fsyncLock()`
+  command, copy all of the data in the data directory, and then restore the data to the stale
   member. Use the procedures for backup storage in
   :doc:`/administration/backups`.
 
-- Use the :dbcommand:`copydb` and :dbcommand:`clone` commands, as
-  described in :doc:`/tutorial/copy-databases-between-instances`.
-
 .. _replica-set-security:
 
 Replica Set Security