@@ -18,17 +18,15 @@ troubleshooting and for further understanding MongoDB's behavior and approach.
18
18
Oplog
19
19
-----
20
20
21
- Replication itself works by way of a special :term:`capped collection`
22
- called the :term:`oplog`. This collection keeps a rolling record of
23
- all operations applied to the :term:`primary`. Secondary members then
24
- replicate this log by applying the operations to themselves in an
25
- asynchronous process. Under normal operation, :term:`secondary` members
26
- reflect writes within one second of the primary. However, various
27
- exceptional situations may cause secondaries to lag behind further. See
21
+ Under normal operation, MongoDB updates the :ref:`oplog
22
+ <replica-set-oplog-sizing>` on a :term:`secondary` within one second of
23
+ applying an operation to a :ref:`primary`. However, various exceptional
24
+ situations may cause a secondary to lag further behind. See
28
25
:ref:`Replication Lag <replica-set-replication-lag>` for details.
29
26
30
- All members send heartbeats (pings) to all other members in the set and can
31
- import operations to the local oplog from any other member in the set.
27
+ All members of a :term:`replica set` send heartbeats (pings) to all
28
+ other members in the set and can import operations to the local oplog
29
+ from any other member in the set.
32
30
33
31
Replica set oplog operations are :term:`idempotent`. The following
34
32
operations require idempotency:
@@ -37,9 +35,6 @@ operations require idempotency:
37
35
- post-rollback catch-up
38
36
- sharding chunk migrations
39
37
40
- .. seealso:: The :ref:`replica-set-oplog-sizing` topic in
41
- :doc:`/core/replication`.
42
-
43
38
.. TODO Verify that "sharding chunk migrations" (above) requires
44
39
idempotency. The wiki was unclear on the subject.
45
40
@@ -48,9 +43,12 @@ operations require idempotency:
48
43
49
44
.. _replica-set-implementation:
50
45
51
- Implementation
46
+ Data Integrity
52
47
--------------
53
48
49
+ Read Preferences
50
+ ~~~~~~~~~~~~~~~~
51
+
54
52
MongoDB uses :term:`single-master replication` to ensure that the
55
53
database remains consistent. However, clients may modify the
56
54
:ref:`read preferences <replica-set-read-preference>` on a
@@ -71,6 +69,13 @@ section for more about :ref:`read preference
71
69
output to asses the current state of replication and determine if
72
70
there is any unintended replication delay.
73
71
72
+
73
+
74
+
75
+
76
+ Elections for Primary
77
+ ~~~~~~~~~~~~~~~~~~~~~
78
+
74
79
In the default configuration, all members have an equal chance of
75
80
becoming primary; however, it's possible to set :data:`priority <members[n].priority>` values that
76
81
weight the election. In some architectures, there may be operational
@@ -80,6 +85,12 @@ center should *not* become primary. See: :ref:`node
80
85
priority <replica-set-node-priority>` for more background on this
81
86
concept.
82
87
88
+
89
+
90
+
91
+ Configurations that Affect Membership Behavior
92
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93
+
83
94
Replica sets can also include members with the following four special
84
95
configurations that affect membership behavior:
85
96
@@ -115,26 +126,33 @@ administration. In particular use the :method:`rs.conf()` to return a
115
126
</reference/replica-configuration>` and use :method:`rs.reconfig()` to
116
127
modify the configuration of an existing replica set.
117
128
129
+
130
+
131
+
132
+
133
+
118
134
.. index:: replica set; elections
119
135
.. index:: replica set; failover
120
136
.. _replica-set-election-internals:
121
137
122
138
Elections
123
139
---------
124
140
125
- When you initialize a :term:`replica set` for the first time, or when any
126
- failover occurs, an election takes place to decide which member should
141
+ Elections are the process :term:`replica set` members use to select which member should
127
142
become :term:`primary`. A primary is the only member in the replica
128
143
set that can accept write operations, including :method:`insert()
129
144
<db.collection.insert()>`, :method:`update() <db.collection.update()>`,
130
145
and :method:`remove() <db.collection.remove()>`.
131
146
132
- Elections are the process replica set members use to
133
- select the primary in a set. Two types of events can trigger an election:
134
- a primary steps down or a :term:`secondary` member
135
- loses contact with a primary. All members have one vote
136
- in an election, and any :program:`mongod` can veto an election. A
137
- single veto invalidates the election.
147
+ The following events can trigger an election:
148
+
149
+ - You initialize a replica set for the first time.
150
+
151
+ - A primary steps down.
152
+
153
+ - A :term:`secondary` member loses contact with a primary.
154
+
155
+ - A failover occurs.
138
156
139
157
An existing primary will step down in response to the
140
158
:dbcommand:`replSetStepDown` command or if it sees that one of
@@ -146,11 +164,13 @@ set. When the current primary steps down, it closes all open client
146
164
connections to prevent clients from unknowingly writing data to a
147
165
non-primary member.
148
166
149
- In an election, every member, including :ref:`hidden
150
- <replica-set-hidden-members>` members, :ref:`arbiters
151
- <replica-set-arbiters>`, and even recovering members, get a single
152
- vote. Members will give votes to every eligible member that calls an
153
- election.
167
+ In an election, all members have one vote,
168
+ including :ref:`hidden <replica-set-hidden-members>` members, :ref:`arbiters
169
+ <replica-set-arbiters>`, and even recovering members.
170
+ Any :program:`mongod` can veto an election.
171
+
172
+ Any member of a replica set can veto an election, even if the
173
+ member is a :ref:`non-voting member <replica-set-non-voting-members>`.
154
174
155
175
A member of the set will veto an election under the following
156
176
conditions:
@@ -167,15 +187,10 @@ conditions:
167
187
(i.e. a higher "optime") than the member seeking election, from the
168
188
perspective of the voting member.
169
189
170
- - The current primary will also veto an election if it has the same or
190
+ - The current primary will veto an election if it has the same or
171
191
more recent operations (i.e. a "higher or equal optime") than the
172
192
member seeking election.
173
193
174
- .. note::
175
-
176
- Any member of a replica set *can* veto an election, even if the
177
- member is a :ref:`non-voting member <replica-set-non-voting-members>`.
178
-
179
194
The first member to receive votes from a majority of members in a set
180
195
becomes the next primary until the next election. Be
181
196
aware of the following conditions and possible situations:
@@ -186,15 +201,9 @@ aware of the following conditions and possible situations:
186
201
187
202
- Replica set members compare priorities only with other members of
188
203
the set. The absolute value of priorities does not have any impact on
189
- the outcome of replica set elections.
190
-
191
- .. note::
192
-
193
- The only exception is that members with :data:`priority
194
- <members[n].priority>` values of ``0``
195
- cannot become primary and will not seek election. See
196
- :ref:`replica-set-node-priority-configuration` for more
197
- information.
204
+ the outcome of replica set elections, with the exception of the value ``0``,
205
+ which indicates the member cannot become primary and cannot seek election.
206
+ For details, see :ref:`replica-set-node-priority-configuration`.
198
207
199
208
- A replica set member cannot become primary *unless* it has the
200
209
highest "optime" of any visible member in the set.
@@ -204,12 +213,31 @@ aware of the following conditions and possible situations:
204
213
primary until the member with the highest priority catches up
205
214
to the latest operation.
206
215
207
-
208
216
.. seealso:: :ref:`Non-voting members in a replica
209
217
set <replica-set-non-voting-members>`,
210
218
:ref:`replica-set-node-priority-configuration`, and
211
219
:data:`replica configuration <members[n].votes>`.
212
220
221
+
222
+ Elections and Network Partitions
223
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
224
+
225
+ A replica set has at most one primary at a given time. If a majority of
226
+ the set is up, the most up-to-date secondary will be elected primary. If
227
+ a majority of the set is not up or reachable, no member will be elected
228
+ primary.
229
+
230
+ There is no way to tell (from the set's point of view) the difference
231
+ between a network partition and nodes going down, so members left in a
232
+ minority will not attempt to become primary (to prevent a set from
233
+ ending up with primaries on either side of a partition).
234
+
235
+ This means that, if there is no majority on either side of a network
236
+ partition, the set will be read only. Thus, we suggest an odd number of
237
+ servers: e.g., two servers in one data center and one in another. The
238
+ upshot of this strategy is that data is consistent: there are no
239
+ multi-primary conflicts to resolve.
240
+
213
241
Syncing
214
242
-------
215
243
0 commit comments