@@ -524,25 +524,29 @@ provide good places to start a troubleshooting investigation with
524
524
525
525
.. _replica-set-replication-lag:
526
526
527
+
528
+
529
+
530
+
527
531
Replication Lag
528
532
~~~~~~~~~~~~~~~
529
533
530
534
Replication lag is a delay between an operation on the :term:`primary`
531
- and the application of that operation from :term:`oplog` to the
535
+ and the application of that operation from the :term:`oplog` to the
532
536
:term:`secondary`. Such lag can be a significant issue and can
533
537
seriously affect MongoDB :term:`replica set` deployments. Excessive
534
538
replication lag makes "lagged" members ineligible to quickly become
535
539
primary and increases the possibility that distributed
536
540
read operations will be inconsistent.
537
541
538
- Identify replication lag by checking the values of
542
+ Identify replication lag by checking the value of
539
543
:data:`members[n].optimeDate` for each member of the replica set
540
544
using the :method:`rs.status()` function in the :program:`mongo`
541
545
shell.
542
546
543
547
Possible causes of replication lag include:
544
548
545
- - **Network Latency. **
549
+ - **Network Latency**
546
550
547
551
Check the network routes between the members of your set to ensure
548
552
that there is no packet loss or network routing issue.
@@ -551,7 +555,7 @@ Possible causes of replication lag include:
551
555
members and ``traceroute`` to expose the routing of packets
552
556
network endpoints.
553
557
554
- - **Disk Throughput. **
558
+ - **Disk Throughput**
555
559
556
560
If the file system and disk device on the secondary is
557
561
unable to flush data to disk as quickly as the primary, then
@@ -564,7 +568,7 @@ Possible causes of replication lag include:
564
568
Use system-level tools to assess disk status, including
565
569
``iostat`` or ``vmstat``.
566
570
567
- - **Concurrency. **
571
+ - **Concurrency**
568
572
569
573
In some cases, long-running operations on the primary can block
570
574
replication on secondaries. You can use
@@ -574,6 +578,40 @@ Possible causes of replication lag include:
574
578
Use the :term:`database profiler` to see if there are slow queries
575
579
or long-running operations that correspond to the incidences of lag.
576
580
581
+ - **The Oplog Size is Too Small**
582
+
583
+ As commands are sent to the primary, they are recorded in the oplog.
584
+ Secondaries update themselves by reading the oplog and applying the
585
+ commands. The oplog is a circular buffer. When full, it erases the
586
+ oldest commands to write new ones. The secondaries keep track of the
587
+ last oplog command that they read. Under times of heavy load, the
588
+ contents of the secondaries will lag behind the contents of the
589
+ primary.
590
+
591
+ If the replication lag exceeds the amount of time buffered in the
592
+ oplog, then the replication cannot continue. Put another way, if the
593
+ primary overwrites that command before the secondary has a chance to
594
+ apply it, then the replication has failed – there are commands that
595
+ have been applied on the primary that the secondary is not able to
596
+ apply.
597
+
598
+
599
+
600
+
601
+ See http://docs.mongodb.org/manual/tutorial/change-oplog-size/ for more information.
602
+
603
+
604
+
605
+
606
+ - **Read Starvation**
607
+
608
+ - **Write Starvation**
609
+
610
+ - **Failure to Use Appropriate Write Concern in a High-Write Environment**
611
+
612
+
613
+
614
+
577
615
Failover and Recovery
578
616
~~~~~~~~~~~~~~~~~~~~~
579
617
0 commit comments