Skip to content

Commit f6b8b88

Browse files
author
Bob Grabar
committed
DOCS-403 added causes of replication lag
1 parent 4f62796 commit f6b8b88

File tree

1 file changed

+33
-67
lines changed

1 file changed

+33
-67
lines changed

source/administration/replica-sets.txt

Lines changed: 33 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -540,6 +540,9 @@ Identify replication lag by checking the value of
540540
using the :method:`rs.status()` function in the :program:`mongo`
541541
shell.
542542

543+
Also, you can monitor how fast replication occurs by watching the oplog
544+
time in the "replica" graph in MMS.
545+
543546
Possible causes of replication lag include:
544547

545548
- **Network Latency**
@@ -567,85 +570,48 @@ Possible causes of replication lag include:
567570
- **Concurrency**
568571

569572
In some cases, long-running operations on the primary can block
570-
replication on secondaries. You can use
571-
:term:`write concern` to prevent write operations from returning
572-
if replication cannot keep up with the write load.
573+
replication on secondaries. You can use :term:`write concern` to
574+
prevent write operations from returning if replication cannot keep up
575+
with the write load.
573576

574577
Use the :term:`database profiler` to see if there are slow queries
575578
or long-running operations that correspond to the incidences of lag.
576579

577580
- **Oplog Size is Too Small for the Data Load**
578581

579-
If you perform a large number of writes for a large amount of data
580-
581-
As commands are sent to the primary, they are recorded in the oplog.
582-
Secondaries update themselves by reading the oplog and applying the
583-
commands. The oplog is a circular buffer. When full, it erases the
584-
oldest commands in order to write new ones. Under times of heavy load,
585-
the contents of the secondaries will lag behind the contents of the
586-
primary. If the replication lag exceeds the amount of time buffered in
587-
the oplog, then the replication cannot continue. Put another way, if
588-
the primary overwrites that command before the secondary has a chance
589-
to apply it, then the replication has failed – there are commands that
590-
have been applied on the primary that the secondary is not able to
591-
apply.
592-
593-
See the documentation for :doc:`/tutorial/change-oplog-size` for more information.
594-
595-
- **Read Starvation**
596-
597-
The secondaries cannot are not able to read the oplog fast enough, and the
598-
oplog writes over old data before the secondaries can read it. This
599-
can happen if you are reading a large amount of data but have not
600-
set the oplog large enough. 10gen recommends an oplog time of
601-
primary was inundated with writes to the point where replication
602-
(the secondaries running queries to get the changes from the oplog)
603-
cannot keep up. This can lead to a lag on the secondaries that
604-
ultimately becomes larger than the oplog on the primary.
605-
606-
- **Failure to Use Appropriate Write Concern in a High-Write Environment**
582+
If you do not set your oplog large enough, the oplog overwrites old
583+
data before the secondaries can read it. The oplog is a circular
584+
buffer, and when full it erases the oldest commands in order to write
585+
new ones. If your oplog size is too small, the secondaries reach a
586+
point where they no longer can access certain updates. The secondaries
587+
become stale.
607588

608-
If you perform very large data loads on a regular basis but fail to
609-
set the appropriate write concern, the large volume of write traffic
610-
on the primary will always take precedence over read requests from
611-
secondaries. This will significantly slow replication by severely
612-
reducing the numbers of reads that the secondaries can make on the
613-
oplog in order to update themselves.
589+
To set oplog size, see :doc:`/tutorial/change-oplog-size`.
614590

615-
The oplog is circular. When it is full, it begins overwriting the
616-
oldest data with the newest. If the secondaries have not caught up in
617-
their reads, they reach a point where they no longer can access
618-
certain updates. The secondaries become stale.
591+
- **Failure to Use Appropriate Write Concern in a High-Write Environment**
619592

620-
To prevent this, use "Write Concern" to tell Mongo to always perform a
621-
safe write after a designated number of inserts, such as after every
622-
1,000 inserts. This provides a space for the secondaries to catch up with the
623-
primary. Setting a write concern slightly slows down the data load, but it keeps your
624-
secondaries from going stale.
593+
If the primary is making a very high number of writes and if you have
594+
not set the appropriate write concern, the secondaries will not be
595+
able to read the oplog fast enough to keep up with changes. Write
596+
requests take precedence over read requests, and a very large number
597+
of writes will significantly reduce the numbers of reads the
598+
secondaries can make on the oplog in order to update themselves.
599+
600+
The replication lag can grow to the point that the oplog overwrites
601+
commands that the secondaries have not yet read. The oplog is a
602+
circular buffer, and when full it erases the oldest commands in order
603+
to write new ones. If the secondaries get too far behind in their
604+
reads, they reach a point where they no longer can access certain
605+
updates, and so the secondaries become stale.
606+
607+
To prevent this, use "write concern" to tell MongoDB to always perform
608+
a safe write after a designated number of inserts, such as after every
609+
1,000 inserts. This provides a space for the secondaries to catch up
610+
with the primary. Setting a write concern does slightly slow down the
611+
data load, but it keeps your secondaries from going stale.
625612

626613
See :ref:`replica-set-write-concern` for more information.
627614

628-
If you do this, and your driver supports it, I recommend that
629-
you use a mode of 'majority'.
630-
631-
The exact way you use Safe Mode depends on what driver you're using
632-
for your data load program. You can read more about Safe Mode here:
633-
634-
http://www.mongodb.org/display/DOCS/getLastError+Command
635-
http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError
636-
637-
638-
take precedence over requests from the secondaries to read the oplog and update themselves.
639-
Write requests have priority over read requests. This will significantly
640-
641-
the read requests from the secondaries from reading the replication data
642-
from the oplog. Secondaries must be able to and significantly slow
643-
down replication to the point that the oplog overwrites commands that
644-
the secondaries have not yet read.
645-
646-
You can monitor how fast replication occurs by watching the oplog time
647-
in the "replica" graph in MMS.
648-
649615
Failover and Recovery
650616
~~~~~~~~~~~~~~~~~~~~~
651617

0 commit comments

Comments
 (0)