Skip to content

Commit f4da7cf

Browse files
author
Bob Grabar
committed
DOCS-260 migrating rs design concepts, draft 1
1 parent 55064ee commit f4da7cf

File tree

3 files changed

+116
-72
lines changed

3 files changed

+116
-72
lines changed

source/administration/replica-sets.txt

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,16 @@ configurations.
4646
.. warning::
4747

4848
The :method:`rs.reconfig()` shell command can force the current
49-
primary to step down, which causes an election. When the primary
49+
primary to step down, which causes an :ref:`election <replica-set-elections>`. When the primary
5050
steps down, the :program:`mongod` closes all client
5151
connections. While, this typically takes 10-20 seconds, attempt to
5252
make these changes during scheduled maintenance periods.
5353

54+
.. seealso::
55+
56+
- The :ref:`replica-set-elections` topic in the :doc:`/core/replication` document
57+
- The :ref:`replica-set-election-internals` topic in the :doc:`/core/replication-internals` document
58+
5459
.. index:: replica set members; secondary only
5560
.. _replica-set-secondary-only-members:
5661
.. _replica-set-secondary-only-configuration:
@@ -69,9 +74,10 @@ these members from ever becoming primary.
6974

7075
To configure a member as secondary-only, set its
7176
:data:`members[n].priority` value to ``0``. Any member with a
72-
:data:`members[n].priority` equal to ``0`` will never seek election and
73-
cannot become primary in any situation. For more information on priority
74-
levels, see :ref:`replica-set-node-priority`.
77+
:data:`members[n].priority` equal to ``0`` will never seek
78+
:ref:`election <replica-set-elections>` and cannot become primary in any
79+
situation. For more information on priority levels, see
80+
:ref:`replica-set-node-priority`.
7581

7682
As an example of modifying member priorities, assume a four-member
7783
replica set with member ``_id`` values of: ``0``, ``1``, ``2``, and
@@ -107,10 +113,10 @@ This sets the following:
107113
If your replica set has an even number of members, add an
108114
:ref:`arbiter <replica-set-arbiters>` to ensure that
109115
members can quickly obtain a majority of votes in an
110-
:ref:`election <replica-set-elections>` for primary.
116+
election for primary.
111117

112118
.. seealso:: :data:`members[n].priority` and :ref:`Replica Set
113-
Reconfiguration <replica-set-reconfiguration-usage>`.
119+
Reconfiguration <replica-set-reconfiguration-usage>`
114120

115121
.. index:: replica set members; hidden
116122
.. _replica-set-hidden-members:
@@ -183,8 +189,8 @@ the amount of slave delay to apply:
183189

184190
- The size of the oplog is sufficient to capture *more than* the
185191
number of operations that typically occur in that period of
186-
time. See the section on :ref:`oplog sizing
187-
<replica-set-oplog-sizing>` for more information.
192+
time. For more information on oplog size, see the
193+
:ref:`replica-set-oplog-sizing` topic in the :doc:`/core/replication` document.
188194

189195
Delayed members must have a :term:`priority` set to ``0`` to prevent
190196
them from becoming primary in their replica sets. Also these members
@@ -233,7 +239,7 @@ Arbiters
233239

234240
Arbiters are special :program:`mongod` instances that do not hold a
235241
copy of the data and thus cannot become primary. Arbiters exist solely
236-
participate in :term:`elections <election>`.
242+
participate in :ref:`elections <replica-set-elections>`.
237243

238244
.. note::
239245

@@ -290,15 +296,14 @@ Non-Voting
290296
~~~~~~~~~~
291297

292298
You may choose to change the number of votes that each member has in
293-
:term:`elections <election>` for :term:`primary`. In general, all
299+
:ref:`elections <replica-set-elections>` for :term:`primary`. In general, all
294300
members should have only 1 vote to prevent intermittent ties, deadlock,
295301
or the wrong members from becoming :term:`primary`. Use :ref:`replica
296302
set priorities <replica-set-node-priority>` to control which members
297303
are more likely to become primary.
298304

299-
To disable a member's ability to vote in :ref:`elections
300-
<replica-set-elections>` use the following command sequence in the
301-
:program:`mongo` shell.
305+
To disable a member's ability to vote in elections, use the following
306+
command sequence in the :program:`mongo` shell.
302307

303308
.. code-block:: javascript
304309

@@ -454,7 +459,7 @@ number. :method:`rs.reconfig()` will not change the value of
454459
.. warning::
455460

456461
Any replica set configuration change can trigger the current
457-
:term:`primary` to step down, which forces an :term:`election`. This
462+
:term:`primary` to step down, which forces an :ref:`election <replica-set-elections>`. This
458463
causes the current shell session, and clients connected to this replica set,
459464
to produce an error even when the operation succeeds.
460465

@@ -486,7 +491,7 @@ the new configuration.
486491

487492
If a member has :data:`members[n].priority` set to ``0``, it is
488493
ineligible to become :term:`primary` and will not seek
489-
elections. :ref:`Hidden members <replica-set-hidden-members>`,
494+
election. :ref:`Hidden members <replica-set-hidden-members>`,
490495
:ref:`delayed members <replica-set-delayed-members>`, and
491496
:ref:`arbiters <replica-set-arbiters>` all have :data:`members[n].priority`
492497
set to ``0``.
@@ -741,4 +746,7 @@ data to a :term:`BSON` file that you can view using
741746
You can prevent rollbacks by ensuring safe writes by using
742747
the appropriate :term:`write concern`.
743748

744-
.. seealso:: :ref:`Replica Set Elections <replica-set-elections>`
749+
.. seealso::
750+
751+
- The :ref:`replica-set-elections` topic in the :doc:`/core/replication` document
752+
- The :ref:`replica-set-election-internals` topic in the :doc:`/core/replication-internals` document

source/core/replication-internals.txt

Lines changed: 70 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,15 @@ troubleshooting and for further understanding MongoDB's behavior and approach.
1818
Oplog
1919
-----
2020

21-
Replication itself works by way of a special :term:`capped collection`
22-
called the :term:`oplog`. This collection keeps a rolling record of
23-
all operations applied to the :term:`primary`. Secondary members then
24-
replicate this log by applying the operations to themselves in an
25-
asynchronous process. Under normal operation, :term:`secondary` members
26-
reflect writes within one second of the primary. However, various
27-
exceptional situations may cause secondaries to lag behind further. See
21+
Under normal operation, MongoDB updates the :ref:`oplog
22+
<replica-set-oplog-sizing>` on a :term:`secondary` within one second of
23+
applying an operation to a :ref:`primary`. However, various exceptional
24+
situations may cause a secondary to lag further behind. See
2825
:ref:`Replication Lag <replica-set-replication-lag>` for details.
2926

30-
All members send heartbeats (pings) to all other members in the set and can
31-
import operations to the local oplog from any other member in the set.
27+
All members of a :term:`replica set` send heartbeats (pings) to all
28+
other members in the set and can import operations to the local oplog
29+
from any other member in the set.
3230

3331
Replica set oplog operations are :term:`idempotent`. The following
3432
operations require idempotency:
@@ -37,9 +35,6 @@ operations require idempotency:
3735
- post-rollback catch-up
3836
- sharding chunk migrations
3937

40-
.. seealso:: The :ref:`replica-set-oplog-sizing` topic in
41-
:doc:`/core/replication`.
42-
4338
.. TODO Verify that "sharding chunk migrations" (above) requires
4439
idempotency. The wiki was unclear on the subject.
4540

@@ -48,9 +43,12 @@ operations require idempotency:
4843

4944
.. _replica-set-implementation:
5045

51-
Implementation
46+
Data Integrity
5247
--------------
5348

49+
Read Preferences
50+
~~~~~~~~~~~~~~~~
51+
5452
MongoDB uses :term:`single-master replication` to ensure that the
5553
database remains consistent. However, clients may modify the
5654
:ref:`read preferences <replica-set-read-preference>` on a
@@ -71,6 +69,13 @@ section for more about :ref:`read preference
7169
output to asses the current state of replication and determine if
7270
there is any unintended replication delay.
7371

72+
73+
74+
75+
76+
Elections for Primary
77+
~~~~~~~~~~~~~~~~~~~~~
78+
7479
In the default configuration, all members have an equal chance of
7580
becoming primary; however, it's possible to set :data:`priority <members[n].priority>` values that
7681
weight the election. In some architectures, there may be operational
@@ -80,6 +85,12 @@ center should *not* become primary. See: :ref:`node
8085
priority <replica-set-node-priority>` for more background on this
8186
concept.
8287

88+
89+
90+
91+
Configurations that Affect Membership Behavior
92+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93+
8394
Replica sets can also include members with the following four special
8495
configurations that affect membership behavior:
8596

@@ -115,26 +126,33 @@ administration. In particular use the :method:`rs.conf()` to return a
115126
</reference/replica-configuration>` and use :method:`rs.reconfig()` to
116127
modify the configuration of an existing replica set.
117128

129+
130+
131+
132+
133+
118134
.. index:: replica set; elections
119135
.. index:: replica set; failover
120136
.. _replica-set-election-internals:
121137

122138
Elections
123139
---------
124140

125-
When you initialize a :term:`replica set` for the first time, or when any
126-
failover occurs, an election takes place to decide which member should
141+
Elections are the process :term:`replica set` members use to select which member should
127142
become :term:`primary`. A primary is the only member in the replica
128143
set that can accept write operations, including :method:`insert()
129144
<db.collection.insert()>`, :method:`update() <db.collection.update()>`,
130145
and :method:`remove() <db.collection.remove()>`.
131146

132-
Elections are the process replica set members use to
133-
select the primary in a set. Two types of events can trigger an election:
134-
a primary steps down or a :term:`secondary` member
135-
loses contact with a primary. All members have one vote
136-
in an election, and any :program:`mongod` can veto an election. A
137-
single veto invalidates the election.
147+
The following events can trigger an election:
148+
149+
- You initialize a replica set for the first time.
150+
151+
- A primary steps down.
152+
153+
- A :term:`secondary` member loses contact with a primary.
154+
155+
- A failover occurs.
138156

139157
An existing primary will step down in response to the
140158
:dbcommand:`replSetStepDown` command or if it sees that one of
@@ -146,11 +164,13 @@ set. When the current primary steps down, it closes all open client
146164
connections to prevent clients from unknowingly writing data to a
147165
non-primary member.
148166

149-
In an election, every member, including :ref:`hidden
150-
<replica-set-hidden-members>` members, :ref:`arbiters
151-
<replica-set-arbiters>`, and even recovering members, get a single
152-
vote. Members will give votes to every eligible member that calls an
153-
election.
167+
In an election, all members have one vote,
168+
including :ref:`hidden <replica-set-hidden-members>` members, :ref:`arbiters
169+
<replica-set-arbiters>`, and even recovering members.
170+
Any :program:`mongod` can veto an election.
171+
172+
Any member of a replica set can veto an election, even if the
173+
member is a :ref:`non-voting member <replica-set-non-voting-members>`.
154174

155175
A member of the set will veto an election under the following
156176
conditions:
@@ -167,15 +187,10 @@ conditions:
167187
(i.e. a higher "optime") than the member seeking election, from the
168188
perspective of the voting member.
169189

170-
- The current primary will also veto an election if it has the same or
190+
- The current primary will veto an election if it has the same or
171191
more recent operations (i.e. a "higher or equal optime") than the
172192
member seeking election.
173193

174-
.. note::
175-
176-
Any member of a replica set *can* veto an election, even if the
177-
member is a :ref:`non-voting member <replica-set-non-voting-members>`.
178-
179194
The first member to receive votes from a majority of members in a set
180195
becomes the next primary until the next election. Be
181196
aware of the following conditions and possible situations:
@@ -186,15 +201,9 @@ aware of the following conditions and possible situations:
186201

187202
- Replica set members compare priorities only with other members of
188203
the set. The absolute value of priorities does not have any impact on
189-
the outcome of replica set elections.
190-
191-
.. note::
192-
193-
The only exception is that members with :data:`priority
194-
<members[n].priority>` values of ``0``
195-
cannot become primary and will not seek election. See
196-
:ref:`replica-set-node-priority-configuration` for more
197-
information.
204+
the outcome of replica set elections, with the exception of the value ``0``,
205+
which indicates the member cannot become primary and cannot seek election.
206+
For details, see :ref:`replica-set-node-priority-configuration`.
198207

199208
- A replica set member cannot become primary *unless* it has the
200209
highest "optime" of any visible member in the set.
@@ -204,12 +213,31 @@ aware of the following conditions and possible situations:
204213
primary until the member with the highest priority catches up
205214
to the latest operation.
206215

207-
208216
.. seealso:: :ref:`Non-voting members in a replica
209217
set <replica-set-non-voting-members>`,
210218
:ref:`replica-set-node-priority-configuration`, and
211219
:data:`replica configuration <members[n].votes>`.
212220

221+
222+
Elections and Network Partitions
223+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
224+
225+
A replica set has at most one primary at a given time. If a majority of
226+
the set is up, the most up-to-date secondary will be elected primary. If
227+
a majority of the set is not up or reachable, no member will be elected
228+
primary.
229+
230+
There is no way to tell (from the set's point of view) the difference
231+
between a network partition and nodes going down, so members left in a
232+
minority will not attempt to become primary (to prevent a set from
233+
ending up with primaries on either side of a partition).
234+
235+
This means that, if there is no majority on either side of a network
236+
partition, the set will be read only. Thus, we suggest an odd number of
237+
servers: e.g., two servers in one data center and one in another. The
238+
upshot of this strategy is that data is consistent: there are no
239+
multi-primary conflicts to resolve.
240+
213241
Syncing
214242
-------
215243

0 commit comments

Comments
 (0)