Skip to content

Commit 9b75247

Browse files
author
Bob Grabar
committed
DOCS-260 migrating rs design concepts, draft 2
1 parent f4da7cf commit 9b75247

File tree

1 file changed

+89
-40
lines changed

1 file changed

+89
-40
lines changed

source/core/replication-internals.txt

Lines changed: 89 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@ troubleshooting and for further understanding MongoDB's behavior and approach.
1818
Oplog
1919
-----
2020

21-
Under normal operation, MongoDB updates the :ref:`oplog
22-
<replica-set-oplog-sizing>` on a :term:`secondary` within one second of
23-
applying an operation to a :ref:`primary`. However, various exceptional
21+
Under normal operation, MongoDB updates the :ref:`oplog <replica-set-oplog-sizing>`
22+
on a :term:`secondary` within one second of
23+
applying an operation to a :term:`primary`. However, various exceptional
2424
situations may cause a secondary to lag further behind. See
2525
:ref:`Replication Lag <replica-set-replication-lag>` for details.
2626

@@ -41,6 +41,7 @@ operations require idempotency:
4141
.. In 2.0, replicas would import entries from the member lowest
4242
.. "ping," This wasn't true in 1.8 and will likely change in 2.2.
4343

44+
.. _replica-set-data-integrity:
4445
.. _replica-set-implementation:
4546

4647
Data Integrity
@@ -57,10 +58,9 @@ per-connection basis in order to distribute read operations to the
5758
greater query throughput by distributing reads to secondary members. But
5859
keep in mind that replication is asynchronous; therefore, reads from
5960
secondaries may not always reflect the latest writes to the
60-
:term:`primary`. See the :ref:`consistency <replica-set-consistency>`
61-
section for more about :ref:`read preference
62-
<replica-set-read-preference>` and :ref:`write concern
63-
<replica-set-write-concern>`.
61+
:term:`primary`.
62+
63+
.. seealso:: :ref:`replica-set-consistency`
6464

6565
.. note::
6666

@@ -69,29 +69,71 @@ section for more about :ref:`read preference
6969
output to asses the current state of replication and determine if
7070
there is any unintended replication delay.
7171

72+
Write Concern and getLastError
73+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7274

75+
A write is committed once it has replicated to a majority of members of
76+
the set. For important writes, the client should request acknowledgement
77+
of this with :dbcommand:`getLastError` set to ``w`` to get confirmation
78+
the commit has finished. For more information on
79+
:dbcommand:`getLastError`, see :doc:`/applications/replication`.
7380

81+
.. TODO Verify if the following info is needed. -BG
7482

83+
Queries in MongoDB and replica sets have "READ UNCOMMITTED"
84+
semantics. Writes which are committed at the primary of the set may
85+
be visible before the cluster-wide commit completes.
7586

76-
Elections for Primary
77-
~~~~~~~~~~~~~~~~~~~~~
87+
The read uncommitted semantics (an option on many databases) are more
88+
relaxed and make theoretically achievable performance and
89+
availability higher (for example we never have an object locked in
90+
the server where the locking is dependent on network performance).
7891

79-
In the default configuration, all members have an equal chance of
80-
becoming primary; however, it's possible to set :data:`priority <members[n].priority>` values that
81-
weight the election. In some architectures, there may be operational
82-
reasons for increasing the likelihood of a specific replica set member
83-
becoming primary. For instance, a member located in a remote data
84-
center should *not* become primary. See: :ref:`node
85-
priority <replica-set-node-priority>` for more background on this
86-
concept.
92+
Write Concern and Failover
93+
~~~~~~~~~~~~~~~~~~~~~~~~~~
94+
95+
On a failover, if there are writes which have not replicated from the
96+
:term:`primary`, the writes are rolled back. Therefore, to confirm replica-set-wide commits,
97+
use the :dbcommand:`getLastError` command.
98+
99+
On a failover, data is backed up to files in the rollback directory. To
100+
recover this data use the :program:`mongorestore`.
101+
102+
.. TODO Verify whether to include the following. -BG
103+
104+
Merging back old operations later, after another member has accepted
105+
writes, is a hard problem. One then has multi-master replication,
106+
with potential for conflicting writes. Typically that is handled in
107+
other products by manual version reconciliation code by developers.
108+
We think that is too much work : we want MongoDB usage to be less
109+
developer work, not more. Multi-master also can make atomic operation
110+
semantics problematic.
87111

112+
It is possible (as mentioned above) to manually recover these events,
113+
via manual DBA effort, but we believe in large system with many, many
114+
members that such efforts become impractical.
88115

116+
Some drivers support 'safe' write modes for critical writes. For
117+
example via setWriteConcern in the Java driver.
89118

119+
Additionally, defaults for { w : ... } parameter to getLastError can
120+
be set in the replica set's configuration.
90121

91-
Configurations that Affect Membership Behavior
92-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122+
..note::
93123

94-
Replica sets can also include members with the following four special
124+
Calling :dbcommand:`getLastError` causes the client to wait for a
125+
response from the server. This can slow the client's throughput on
126+
writes if large numbers are made because of the client/server network
127+
turnaround times. Thus for "non-critical" writes it often makes sense
128+
to make no :dbcommand:`getLastError` check at all, or only a single
129+
check after many writes.
130+
131+
.. _replica-set-member-configurations-internals:
132+
133+
Member Configurations
134+
---------------------
135+
136+
Replica sets can include members with the following four special
95137
configurations that affect membership behavior:
96138

97139
- :ref:`Secondary-only <replica-set-secondary-only-members>` members have
@@ -117,6 +159,12 @@ unique set of administrative requirements and concerns. Choosing the
117159
right :doc:`system architecture </administration/replication-architectures>`
118160
for your data set is crucial.
119161

162+
.. seealso:: The :ref:`replica-set-member-configurations` topic in the
163+
:doc:`/administration/replica-sets` document.
164+
165+
Security
166+
--------
167+
120168
Administrators of replica sets also have unique :ref:`monitoring
121169
<replica-set-monitoring>` and :ref:`security <replica-set-security>`
122170
concerns. The :ref:`replica set functions <replica-set-functions>` in
@@ -126,11 +174,6 @@ administration. In particular use the :method:`rs.conf()` to return a
126174
</reference/replica-configuration>` and use :method:`rs.reconfig()` to
127175
modify the configuration of an existing replica set.
128176

129-
130-
131-
132-
133-
134177
.. index:: replica set; elections
135178
.. index:: replica set; failover
136179
.. _replica-set-election-internals:
@@ -148,27 +191,34 @@ The following events can trigger an election:
148191

149192
- You initialize a replica set for the first time.
150193

151-
- A primary steps down.
152-
153-
- A :term:`secondary` member loses contact with a primary.
194+
- A primary steps down. A primary will step down in response to the
195+
:dbcommand:`replSetStepDown` command or if it sees that one of the
196+
current secondaries is eligible for election *and* has a higher
197+
priority. A primary also will step down when it cannot contact a
198+
majority of the members of the replica set. When the current primary
199+
steps down, it closes all open client connections to prevent clients
200+
from unknowingly writing data to a non-primary member.
154201

155-
- A failover occurs.
202+
- A :term:`secondary` member loses contact with a primary. A secondary
203+
will call for an election if it cannot establish a connection to a
204+
primary.
156205

157-
An existing primary will step down in response to the
158-
:dbcommand:`replSetStepDown` command or if it sees that one of
159-
the current secondaries is eligible for election *and* has a higher
160-
priority. A secondary will call for an election if it cannot
161-
establish a connection to a primary. A primary will also step
162-
down when it cannot contact a majority of the members of the replica
163-
set. When the current primary steps down, it closes all open client
164-
connections to prevent clients from unknowingly writing data to a
165-
non-primary member.
206+
- A :term:`failover` occurs.
166207

167208
In an election, all members have one vote,
168209
including :ref:`hidden <replica-set-hidden-members>` members, :ref:`arbiters
169210
<replica-set-arbiters>`, and even recovering members.
170211
Any :program:`mongod` can veto an election.
171212

213+
In the default configuration, all members have an equal chance of
214+
becoming primary; however, it's possible to set :data:`priority
215+
<members[n].priority>` values that weight the election. In some
216+
architectures, there may be operational reasons for increasing the
217+
likelihood of a specific replica set member becoming primary. For
218+
instance, a member located in a remote data center should *not* become
219+
primary. See: :ref:`replica-set-node-priority` for more
220+
information.
221+
172222
Any member of a replica set can veto an election, even if the
173223
member is a :ref:`non-voting member <replica-set-non-voting-members>`.
174224

@@ -218,7 +268,6 @@ aware of the following conditions and possible situations:
218268
:ref:`replica-set-node-priority-configuration`, and
219269
:data:`replica configuration <members[n].votes>`.
220270

221-
222271
Elections and Network Partitions
223272
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
224273

@@ -228,7 +277,7 @@ a majority of the set is not up or reachable, no member will be elected
228277
primary.
229278

230279
There is no way to tell (from the set's point of view) the difference
231-
between a network partition and nodes going down, so members left in a
280+
between a network partition and members going down, so members left in a
232281
minority will not attempt to become primary (to prevent a set from
233282
ending up with primaries on either side of a partition).
234283

0 commit comments

Comments
 (0)