Skip to content

Commit 4f62796

Browse files
author
Bob Grabar
committed
DOCS-403 ongoing edits, not yet complete
1 parent db2b388 commit 4f62796

File tree

4 files changed

+122
-84
lines changed

4 files changed

+122
-84
lines changed

source/administration/monitoring.txt

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -339,11 +339,11 @@ This returns all operations that lasted longer than 100 milliseconds.
339339
Ensure that the value specified here (i.e. ``100``) is above the
340340
:setting:`slowms` threshold.
341341

342-
.. seealso:: The ":wiki:`Optimization`" wiki page addresses strategies
342+
.. seealso:: The :wiki:`Optimization` wiki page addresses strategies
343343
that may improve the performance of your database queries and
344344
operations.
345345

346-
.. STUB ":doc:`/applications/optimization`"
346+
.. STUB :doc:`/applications/optimization`
347347

348348
.. _replica-set-monitoring:
349349

@@ -355,30 +355,32 @@ replica sets, beyond the requirements for any MongoDB instance is
355355
"replication lag." This refers to the amount of time that it takes a
356356
write operation on the :term:`primary` to replicate to a
357357
:term:`secondary`. Some very small delay period may be acceptable;
358-
however, as replication lag grows two significant problems emerge:
358+
however, as replication lag grows, two significant problems emerge:
359359

360360
- First, operations that have occurred in the period of lag are not
361361
replicated to one or more secondaries. If you're using replication
362362
to ensure data persistence, exceptionally long delays may impact the
363363
integrity of your data set.
364364

365365
- Second, if the replication lag exceeds the length of the operation
366-
log (":term:`oplog`") then the secondary will have to resync all data
366+
log (:term:`oplog`) then the secondary will have to resync all data
367367
from the :term:`primary` and rebuild all indexes. In normal
368368
circumstances this is uncommon given the typical size of the oplog,
369-
but presents a major problem.
369+
but it's an issue to be aware of.
370+
371+
For causes of replication lag, see :ref:`Replication Lag <replica-set-replication-lag>`.
370372

371373
Replication issues are most often the result of network connectivity
372-
issues between members or a :term:`primary` instance that does not
374+
issues between members or the result of a :term:`primary` that does not
373375
have the resources to support application and replication traffic. To
374-
check the status of a replica use the :dbcommand:`replSetGetStatus` or
376+
check the status of a replica, use the :dbcommand:`replSetGetStatus` or
375377
the following helper in the shell:
376378

377379
.. code-block:: javascript
378380

379381
rs.status()
380382

381-
See the ":doc:`/reference/replica-status`" document for a more in
383+
See the :doc:`/reference/replica-status` document for a more in
382384
depth overview view of this output. In general watch the value of
383385
:status:`optimeDate`. Pay particular attention to the difference in
384386
time between the :term:`primary` and the :term:`secondary` members.
@@ -393,7 +395,7 @@ option, :program:`mongod` will create an default sized oplog.
393395
By default the oplog is 5% of total available disk space on 64-bit
394396
systems.
395397

396-
.. seealso:: ":doc:`/tutorial/change-oplog-size`"
398+
.. seealso:: :doc:`/tutorial/change-oplog-size`
397399

398400
Sharding and Monitoring
399401
-----------------------
@@ -404,10 +406,10 @@ instances. Additionally, shard clusters require monitoring to ensure
404406
that data is effectively distributed among nodes and that sharding
405407
operations are functioning appropriately.
406408

407-
.. seealso:: See the ":wiki:`Sharding`" wiki page for more
409+
.. seealso:: See the :wiki:`Sharding` wiki page for more
408410
information.
409411

410-
.. STUB ":doc:`/core/sharding`"
412+
.. STUB :doc:`/core/sharding`
411413

412414
Config Servers
413415
~~~~~~~~~~~~~~

source/administration/replica-sets.txt

Lines changed: 52 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -524,10 +524,6 @@ provide good places to start a troubleshooting investigation with
524524

525525
.. _replica-set-replication-lag:
526526

527-
528-
529-
530-
531527
Replication Lag
532528
~~~~~~~~~~~~~~~
533529

@@ -578,39 +574,77 @@ Possible causes of replication lag include:
578574
Use the :term:`database profiler` to see if there are slow queries
579575
or long-running operations that correspond to the incidences of lag.
580576

581-
- **The Oplog Size is Too Small**
577+
- **Oplog Size is Too Small for the Data Load**
578+
579+
If you perform a large number of writes for a large amount of data
582580

583581
As commands are sent to the primary, they are recorded in the oplog.
584582
Secondaries update themselves by reading the oplog and applying the
585583
commands. The oplog is a circular buffer. When full, it erases the
586-
oldest commands to write new ones. The secondaries keep track of the
587-
last oplog command that they read. Under times of heavy load, the
588-
contents of the secondaries will lag behind the contents of the
589-
primary.
590-
591-
If the replication lag exceeds the amount of time buffered in the
592-
oplog, then the replication cannot continue. Put another way, if the
593-
primary overwrites that command before the secondary has a chance to
594-
apply it, then the replication has failed – there are commands that
584+
oldest commands in order to write new ones. Under times of heavy load,
585+
the contents of the secondaries will lag behind the contents of the
586+
primary. If the replication lag exceeds the amount of time buffered in
587+
the oplog, then the replication cannot continue. Put another way, if
588+
the primary overwrites that command before the secondary has a chance
589+
to apply it, then the replication has failed – there are commands that
595590
have been applied on the primary that the secondary is not able to
596591
apply.
597592

593+
See the documentation for :doc:`/tutorial/change-oplog-size` for more information.
598594

595+
- **Read Starvation**
599596

597+
The secondaries cannot are not able to read the oplog fast enough, and the
598+
oplog writes over old data before the secondaries can read it. This
599+
can happen if you are reading a large amount of data but have not
600+
set the oplog large enough. 10gen recommends an oplog time of
601+
primary was inundated with writes to the point where replication
602+
(the secondaries running queries to get the changes from the oplog)
603+
cannot keep up. This can lead to a lag on the secondaries that
604+
ultimately becomes larger than the oplog on the primary.
600605

601-
See http://docs.mongodb.org/manual/tutorial/change-oplog-size/ for more information.
606+
- **Failure to Use Appropriate Write Concern in a High-Write Environment**
602607

608+
If you perform very large data loads on a regular basis but fail to
609+
set the appropriate write concern, the large volume of write traffic
610+
on the primary will always take precedence over read requests from
611+
secondaries. This will significantly slow replication by severely
612+
reducing the numbers of reads that the secondaries can make on the
613+
oplog in order to update themselves.
603614

615+
The oplog is circular. When it is full, it begins overwriting the
616+
oldest data with the newest. If the secondaries have not caught up in
617+
their reads, they reach a point where they no longer can access
618+
certain updates. The secondaries become stale.
604619

620+
To prevent this, use "Write Concern" to tell Mongo to always perform a
621+
safe write after a designated number of inserts, such as after every
622+
1,000 inserts. This provides a space for the secondaries to catch up with the
623+
primary. Setting a write concern slightly slows down the data load, but it keeps your
624+
secondaries from going stale.
605625

606-
- **Read Starvation**
626+
See :ref:`replica-set-write-concern` for more information.
607627

608-
- **Write Starvation**
628+
If you do this, and your driver supports it, I recommend that
629+
you use a mode of 'majority'.
630+
631+
The exact way you use Safe Mode depends on what driver you're using
632+
for your data load program. You can read more about Safe Mode here:
633+
634+
http://www.mongodb.org/display/DOCS/getLastError+Command
635+
http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError
609636

610-
- **Failure to Use Appropriate Write Concern in a High-Write Environment**
611637

638+
take precedence over requests from the secondaries to read the oplog and update themselves.
639+
Write requests have priority over read requests. This will significantly
612640

641+
the read requests from the secondaries from reading the replication data
642+
from the oplog. Secondaries must be able to and significantly slow
643+
down replication to the point that the oplog overwrites commands that
644+
the secondaries have not yet read.
613645

646+
You can monitor how fast replication occurs by watching the oplog time
647+
in the "replica" graph in MMS.
614648

615649
Failover and Recovery
616650
~~~~~~~~~~~~~~~~~~~~~

source/core/replication-internals.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ replicate this log by applying the operations to themselves in an
2525
asynchronous process. Under normal operation, :term:`secondary` members
2626
reflect writes within one second of the primary. However, various
2727
exceptional situations may cause secondaries to lag behind further. See
28-
:term:`replication lag` for details.
28+
:ref:`Replication Lag <replica-set-replication-lag>` for details.
2929

3030
All members send heartbeats (pings) to all other members in the set and can
3131
import operations to the local oplog from any other member in the set.

0 commit comments

Comments
 (0)