@@ -524,10 +524,6 @@ provide good places to start a troubleshooting investigation with
524
524
525
525
.. _replica-set-replication-lag:
526
526
527
-
528
-
529
-
530
-
531
527
Replication Lag
532
528
~~~~~~~~~~~~~~~
533
529
@@ -578,39 +574,77 @@ Possible causes of replication lag include:
578
574
Use the :term:`database profiler` to see if there are slow queries
579
575
or long-running operations that correspond to the incidences of lag.
580
576
581
- - **The Oplog Size is Too Small**
577
+ - **Oplog Size is Too Small for the Data Load**
578
+
579
+ If you perform a large number of writes for a large amount of data
582
580
583
581
As commands are sent to the primary, they are recorded in the oplog.
584
582
Secondaries update themselves by reading the oplog and applying the
585
583
commands. The oplog is a circular buffer. When full, it erases the
586
- oldest commands to write new ones. The secondaries keep track of the
587
- last oplog command that they read. Under times of heavy load, the
588
- contents of the secondaries will lag behind the contents of the
589
- primary.
590
-
591
- If the replication lag exceeds the amount of time buffered in the
592
- oplog, then the replication cannot continue. Put another way, if the
593
- primary overwrites that command before the secondary has a chance to
594
- apply it, then the replication has failed – there are commands that
584
+ oldest commands in order to write new ones. Under times of heavy load,
585
+ the contents of the secondaries will lag behind the contents of the
586
+ primary. If the replication lag exceeds the amount of time buffered in
587
+ the oplog, then the replication cannot continue. Put another way, if
588
+ the primary overwrites that command before the secondary has a chance
589
+ to apply it, then the replication has failed – there are commands that
595
590
have been applied on the primary that the secondary is not able to
596
591
apply.
597
592
593
+ See the documentation for :doc:`/tutorial/change-oplog-size` for more information.
598
594
595
+ - **Read Starvation**
599
596
597
+ The secondaries cannot are not able to read the oplog fast enough, and the
598
+ oplog writes over old data before the secondaries can read it. This
599
+ can happen if you are reading a large amount of data but have not
600
+ set the oplog large enough. 10gen recommends an oplog time of
601
+ primary was inundated with writes to the point where replication
602
+ (the secondaries running queries to get the changes from the oplog)
603
+ cannot keep up. This can lead to a lag on the secondaries that
604
+ ultimately becomes larger than the oplog on the primary.
600
605
601
- See http://docs.mongodb.org/manual/tutorial/change-oplog-size/ for more information.
606
+ - **Failure to Use Appropriate Write Concern in a High-Write Environment**
602
607
608
+ If you perform very large data loads on a regular basis but fail to
609
+ set the appropriate write concern, the large volume of write traffic
610
+ on the primary will always take precedence over read requests from
611
+ secondaries. This will significantly slow replication by severely
612
+ reducing the numbers of reads that the secondaries can make on the
613
+ oplog in order to update themselves.
603
614
615
+ The oplog is circular. When it is full, it begins overwriting the
616
+ oldest data with the newest. If the secondaries have not caught up in
617
+ their reads, they reach a point where they no longer can access
618
+ certain updates. The secondaries become stale.
604
619
620
+ To prevent this, use "Write Concern" to tell Mongo to always perform a
621
+ safe write after a designated number of inserts, such as after every
622
+ 1,000 inserts. This provides a space for the secondaries to catch up with the
623
+ primary. Setting a write concern slightly slows down the data load, but it keeps your
624
+ secondaries from going stale.
605
625
606
- - **Read Starvation**
626
+ See :ref:`replica-set-write-concern` for more information.
607
627
608
- - **Write Starvation**
628
+ If you do this, and your driver supports it, I recommend that
629
+ you use a mode of 'majority'.
630
+
631
+ The exact way you use Safe Mode depends on what driver you're using
632
+ for your data load program. You can read more about Safe Mode here:
633
+
634
+ http://www.mongodb.org/display/DOCS/getLastError+Command
635
+ http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError
609
636
610
- - **Failure to Use Appropriate Write Concern in a High-Write Environment**
611
637
638
+ take precedence over requests from the secondaries to read the oplog and update themselves.
639
+ Write requests have priority over read requests. This will significantly
612
640
641
+ the read requests from the secondaries from reading the replication data
642
+ from the oplog. Secondaries must be able to and significantly slow
643
+ down replication to the point that the oplog overwrites commands that
644
+ the secondaries have not yet read.
613
645
646
+ You can monitor how fast replication occurs by watching the oplog time
647
+ in the "replica" graph in MMS.
614
648
615
649
Failover and Recovery
616
650
~~~~~~~~~~~~~~~~~~~~~
0 commit comments