DOCS-403 ongoing edits, not yet complete

Bob Grabar · Bob Grabar · commit 4f62796b8dd5 · 2012-08-27T17:01:44.000-04:00
diff --git a/source/administration/monitoring.txt b/source/administration/monitoring.txt
@@ -339,11 +339,11 @@ This returns all operations that lasted longer than 100 milliseconds.
 Ensure that the value specified here (i.e. ``100``) is above the
 :setting:`slowms` threshold.
 
-.. seealso:: The ":wiki:`Optimization`" wiki page addresses strategies
+.. seealso:: The :wiki:`Optimization` wiki page addresses strategies
    that may improve the performance of your database queries and
    operations.
 
-.. STUB ":doc:`/applications/optimization`"
+.. STUB :doc:`/applications/optimization`
 
 .. _replica-set-monitoring:
 
@@ -355,30 +355,32 @@ replica sets, beyond the requirements for any MongoDB instance is
 "replication lag." This refers to the amount of time that it takes a
 write operation on the :term:`primary` to replicate to a
 :term:`secondary`. Some very small delay period may be acceptable;
-however, as replication lag grows two significant problems emerge:
+however, as replication lag grows, two significant problems emerge:
 
 - First, operations that have occurred in the period of lag are not
   replicated to one or more secondaries. If you're using replication
   to ensure data persistence, exceptionally long delays may impact the
   integrity of your data set.
 
 - Second, if the replication lag exceeds the length of the operation
-  log (":term:`oplog`") then the secondary will have to resync all data
+  log (:term:`oplog`) then the secondary will have to resync all data
   from the :term:`primary` and rebuild all indexes. In normal
   circumstances this is uncommon given the typical size of the oplog,
-  but presents a major problem.
+  but it's an issue to be aware of.
+
+For causes of replication lag, see :ref:`Replication Lag <replica-set-replication-lag>`.
 
 Replication issues are most often the result of network connectivity
-issues between members or a :term:`primary` instance that does not
+issues between members or the result of a :term:`primary` that does not
 have the resources to support application and replication traffic. To
-check the status of a replica use the :dbcommand:`replSetGetStatus` or
+check the status of a replica, use the :dbcommand:`replSetGetStatus` or
 the following helper in the shell:
 
 .. code-block:: javascript
 
    rs.status()
 
-See the ":doc:`/reference/replica-status`" document for a more in
+See the :doc:`/reference/replica-status` document for a more in
 depth overview view of this output. In general watch the value of
 :status:`optimeDate`. Pay particular attention to the difference in
 time between the :term:`primary` and the :term:`secondary` members.
@@ -393,7 +395,7 @@ option, :program:`mongod` will create an default sized oplog.
 By default the oplog is 5% of total available disk space on 64-bit
 systems.
 
-.. seealso:: ":doc:`/tutorial/change-oplog-size`"
+.. seealso:: :doc:`/tutorial/change-oplog-size`
 
 Sharding and Monitoring
 -----------------------
@@ -404,10 +406,10 @@ instances. Additionally, shard clusters require monitoring to ensure
 that data is effectively distributed among nodes and that sharding
 operations are functioning appropriately.
 
-.. seealso:: See the ":wiki:`Sharding`" wiki page for more
+.. seealso:: See the :wiki:`Sharding` wiki page for more
    information.
 
-.. STUB ":doc:`/core/sharding`"
+.. STUB :doc:`/core/sharding`
 
 Config Servers
 ~~~~~~~~~~~~~~
diff --git a/source/administration/replica-sets.txt b/source/administration/replica-sets.txt
@@ -524,10 +524,6 @@ provide good places to start a troubleshooting investigation with
 
 .. _replica-set-replication-lag:
 
-
-
-
-
 Replication Lag
 ~~~~~~~~~~~~~~~
 
@@ -578,39 +574,77 @@ Possible causes of replication lag include:
   Use the :term:`database profiler` to see if there are slow queries
   or long-running operations that correspond to the incidences of lag.
 
-- **The Oplog Size is Too Small**
+- **Oplog Size is Too Small for the Data Load**
+
+  If you perform a large number of writes for a large amount of data
 
   As commands are sent to the primary, they are recorded in the oplog.
   Secondaries update themselves by reading the oplog and applying the
   commands. The oplog is a circular buffer. When full, it erases the
-  oldest commands to write new ones. The secondaries keep track of the
-  last oplog command that they read. Under times of heavy load, the
-  contents of the secondaries will lag behind the contents of the
-  primary.
-
-  If the replication lag exceeds the amount of time buffered in the
-  oplog, then the replication cannot continue. Put another way, if the
-  primary overwrites that command before the secondary has a chance to
-  apply it, then the replication has failed – there are commands that
+  oldest commands in order to write new ones. Under times of heavy load,
+  the contents of the secondaries will lag behind the contents of the
+  primary. If the replication lag exceeds the amount of time buffered in
+  the oplog, then the replication cannot continue. Put another way, if
+  the primary overwrites that command before the secondary has a chance
+  to apply it, then the replication has failed – there are commands that
   have been applied on the primary that the secondary is not able to
   apply.
 
+  See the documentation for :doc:`/tutorial/change-oplog-size` for more information.
 
+- **Read Starvation**
 
+  The secondaries cannot are not able to read the oplog fast enough, and the
+  oplog writes over old data before the secondaries can read it. This
+  can happen if you are reading a large amount of data but have not
+  set the oplog large enough. 10gen recommends an oplog time of 
+  primary was inundated with writes to the point where replication
+  (the secondaries running queries to get the changes from the oplog)
+  cannot keep up. This can lead to a lag on the secondaries that
+  ultimately becomes larger than the oplog on the primary.
 
-See http://docs.mongodb.org/manual/tutorial/change-oplog-size/ for more information.
+- **Failure to Use Appropriate Write Concern in a High-Write Environment**
 
+  If you perform very large data loads on a regular basis but fail to
+  set the appropriate write concern, the large volume of write traffic
+  on the primary will always take precedence over read requests from
+  secondaries. This will significantly slow replication by severely
+  reducing the numbers of reads that the secondaries can make on the
+  oplog in order to update themselves.
 
+  The oplog is circular. When it is full, it begins overwriting the
+  oldest data with the newest. If the secondaries have not caught up in
+  their reads, they reach a point where they no longer can access
+  certain updates. The secondaries become stale.
 
+  To prevent this, use "Write Concern" to tell Mongo to always perform a
+  safe write after a designated number of inserts, such as after every
+  1,000 inserts. This provides a space for the secondaries to catch up with the
+  primary. Setting a write concern slightly slows down the data load, but it keeps your
+  secondaries from going stale.
 
-- **Read Starvation**
+  See :ref:`replica-set-write-concern` for more information.
 
-- **Write Starvation**
+If you do this, and your driver supports it, I recommend that
+  you use a mode of 'majority'.
+
+  The exact way you use Safe Mode depends on what driver you're using
+  for your data load program. You can read more about Safe Mode here:
+
+  http://www.mongodb.org/display/DOCS/getLastError+Command
+  http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError
 
-- **Failure to Use Appropriate Write Concern in a High-Write Environment**
 
+take precedence over requests from the secondaries to read the oplog and update themselves.
+  Write requests have priority over read requests. This will significantly
 
+the read requests from the secondaries from reading the replication data
+  from the oplog. Secondaries must be able to and significantly slow
+  down replication to the point that the oplog overwrites commands that
+  the secondaries have not yet read.
 
+  You can monitor how fast replication occurs by watching the oplog time
+  in the "replica" graph in MMS.
 
 Failover and Recovery
 ~~~~~~~~~~~~~~~~~~~~~
diff --git a/source/core/replication-internals.txt b/source/core/replication-internals.txt
@@ -25,7 +25,7 @@ replicate this log by applying the operations to themselves in an
 asynchronous process.  Under normal operation, :term:`secondary` members
 reflect writes within one second of the primary. However, various
 exceptional situations may cause secondaries to lag behind further. See
-:term:`replication lag` for details.
+:ref:`Replication Lag <replica-set-replication-lag>` for details.
 
 All members send heartbeats (pings) to all other members in the set and can
 import operations to the local oplog from any other member in the set.
diff --git a/source/reference/glossary.txt b/source/reference/glossary.txt