@@ -18,9 +18,9 @@ troubleshooting and for further understanding MongoDB's behavior and approach.
18
18
Oplog
19
19
-----
20
20
21
- Under normal operation, MongoDB updates the :ref:`oplog
22
- <replica-set-oplog-sizing>` on a :term:`secondary` within one second of
23
- applying an operation to a :ref :`primary`. However, various exceptional
21
+ Under normal operation, MongoDB updates the :ref:`oplog <replica-set-oplog-sizing>`
22
+ on a :term:`secondary` within one second of
23
+ applying an operation to a :term :`primary`. However, various exceptional
24
24
situations may cause a secondary to lag further behind. See
25
25
:ref:`Replication Lag <replica-set-replication-lag>` for details.
26
26
@@ -41,6 +41,7 @@ operations require idempotency:
41
41
.. In 2.0, replicas would import entries from the member lowest
42
42
.. "ping," This wasn't true in 1.8 and will likely change in 2.2.
43
43
44
+ .. _replica-set-data-integrity:
44
45
.. _replica-set-implementation:
45
46
46
47
Data Integrity
@@ -57,10 +58,9 @@ per-connection basis in order to distribute read operations to the
57
58
greater query throughput by distributing reads to secondary members. But
58
59
keep in mind that replication is asynchronous; therefore, reads from
59
60
secondaries may not always reflect the latest writes to the
60
- :term:`primary`. See the :ref:`consistency <replica-set-consistency>`
61
- section for more about :ref:`read preference
62
- <replica-set-read-preference>` and :ref:`write concern
63
- <replica-set-write-concern>`.
61
+ :term:`primary`.
62
+
63
+ .. seealso:: :ref:`replica-set-consistency`
64
64
65
65
.. note::
66
66
@@ -69,29 +69,71 @@ section for more about :ref:`read preference
69
69
output to asses the current state of replication and determine if
70
70
there is any unintended replication delay.
71
71
72
+ Write Concern and getLastError
73
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72
74
75
+ A write is committed once it has replicated to a majority of members of
76
+ the set. For important writes, the client should request acknowledgement
77
+ of this with :dbcommand:`getLastError` set to ``w`` to get confirmation
78
+ the commit has finished. For more information on
79
+ :dbcommand:`getLastError`, see :doc:`/applications/replication`.
73
80
81
+ .. TODO Verify if the following info is needed. -BG
74
82
83
+ Queries in MongoDB and replica sets have "READ UNCOMMITTED"
84
+ semantics. Writes which are committed at the primary of the set may
85
+ be visible before the cluster-wide commit completes.
75
86
76
- Elections for Primary
77
- ~~~~~~~~~~~~~~~~~~~~~
87
+ The read uncommitted semantics (an option on many databases) are more
88
+ relaxed and make theoretically achievable performance and
89
+ availability higher (for example we never have an object locked in
90
+ the server where the locking is dependent on network performance).
78
91
79
- In the default configuration, all members have an equal chance of
80
- becoming primary; however, it's possible to set :data:`priority <members[n].priority>` values that
81
- weight the election. In some architectures, there may be operational
82
- reasons for increasing the likelihood of a specific replica set member
83
- becoming primary. For instance, a member located in a remote data
84
- center should *not* become primary. See: :ref:`node
85
- priority <replica-set-node-priority>` for more background on this
86
- concept.
92
+ Write Concern and Failover
93
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
94
+
95
+ On a failover, if there are writes which have not replicated from the
96
+ :term:`primary`, the writes are rolled back. Therefore, to confirm replica-set-wide commits,
97
+ use the :dbcommand:`getLastError` command.
98
+
99
+ On a failover, data is backed up to files in the rollback directory. To
100
+ recover this data use the :program:`mongorestore`.
101
+
102
+ .. TODO Verify whether to include the following. -BG
103
+
104
+ Merging back old operations later, after another member has accepted
105
+ writes, is a hard problem. One then has multi-master replication,
106
+ with potential for conflicting writes. Typically that is handled in
107
+ other products by manual version reconciliation code by developers.
108
+ We think that is too much work : we want MongoDB usage to be less
109
+ developer work, not more. Multi-master also can make atomic operation
110
+ semantics problematic.
87
111
112
+ It is possible (as mentioned above) to manually recover these events,
113
+ via manual DBA effort, but we believe in large system with many, many
114
+ members that such efforts become impractical.
88
115
116
+ Some drivers support 'safe' write modes for critical writes. For
117
+ example via setWriteConcern in the Java driver.
89
118
119
+ Additionally, defaults for { w : ... } parameter to getLastError can
120
+ be set in the replica set's configuration.
90
121
91
- Configurations that Affect Membership Behavior
92
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122
+ ..note::
93
123
94
- Replica sets can also include members with the following four special
124
+ Calling :dbcommand:`getLastError` causes the client to wait for a
125
+ response from the server. This can slow the client's throughput on
126
+ writes if large numbers are made because of the client/server network
127
+ turnaround times. Thus for "non-critical" writes it often makes sense
128
+ to make no :dbcommand:`getLastError` check at all, or only a single
129
+ check after many writes.
130
+
131
+ .. _replica-set-member-configurations-internals:
132
+
133
+ Member Configurations
134
+ ---------------------
135
+
136
+ Replica sets can include members with the following four special
95
137
configurations that affect membership behavior:
96
138
97
139
- :ref:`Secondary-only <replica-set-secondary-only-members>` members have
@@ -117,6 +159,12 @@ unique set of administrative requirements and concerns. Choosing the
117
159
right :doc:`system architecture </administration/replication-architectures>`
118
160
for your data set is crucial.
119
161
162
+ .. seealso:: The :ref:`replica-set-member-configurations` topic in the
163
+ :doc:`/administration/replica-sets` document.
164
+
165
+ Security
166
+ --------
167
+
120
168
Administrators of replica sets also have unique :ref:`monitoring
121
169
<replica-set-monitoring>` and :ref:`security <replica-set-security>`
122
170
concerns. The :ref:`replica set functions <replica-set-functions>` in
@@ -126,11 +174,6 @@ administration. In particular use the :method:`rs.conf()` to return a
126
174
</reference/replica-configuration>` and use :method:`rs.reconfig()` to
127
175
modify the configuration of an existing replica set.
128
176
129
-
130
-
131
-
132
-
133
-
134
177
.. index:: replica set; elections
135
178
.. index:: replica set; failover
136
179
.. _replica-set-election-internals:
@@ -148,27 +191,34 @@ The following events can trigger an election:
148
191
149
192
- You initialize a replica set for the first time.
150
193
151
- - A primary steps down.
152
-
153
- - A :term:`secondary` member loses contact with a primary.
194
+ - A primary steps down. A primary will step down in response to the
195
+ :dbcommand:`replSetStepDown` command or if it sees that one of the
196
+ current secondaries is eligible for election *and* has a higher
197
+ priority. A primary also will step down when it cannot contact a
198
+ majority of the members of the replica set. When the current primary
199
+ steps down, it closes all open client connections to prevent clients
200
+ from unknowingly writing data to a non-primary member.
154
201
155
- - A failover occurs.
202
+ - A :term:`secondary` member loses contact with a primary. A secondary
203
+ will call for an election if it cannot establish a connection to a
204
+ primary.
156
205
157
- An existing primary will step down in response to the
158
- :dbcommand:`replSetStepDown` command or if it sees that one of
159
- the current secondaries is eligible for election *and* has a higher
160
- priority. A secondary will call for an election if it cannot
161
- establish a connection to a primary. A primary will also step
162
- down when it cannot contact a majority of the members of the replica
163
- set. When the current primary steps down, it closes all open client
164
- connections to prevent clients from unknowingly writing data to a
165
- non-primary member.
206
+ - A :term:`failover` occurs.
166
207
167
208
In an election, all members have one vote,
168
209
including :ref:`hidden <replica-set-hidden-members>` members, :ref:`arbiters
169
210
<replica-set-arbiters>`, and even recovering members.
170
211
Any :program:`mongod` can veto an election.
171
212
213
+ In the default configuration, all members have an equal chance of
214
+ becoming primary; however, it's possible to set :data:`priority
215
+ <members[n].priority>` values that weight the election. In some
216
+ architectures, there may be operational reasons for increasing the
217
+ likelihood of a specific replica set member becoming primary. For
218
+ instance, a member located in a remote data center should *not* become
219
+ primary. See: :ref:`replica-set-node-priority` for more
220
+ information.
221
+
172
222
Any member of a replica set can veto an election, even if the
173
223
member is a :ref:`non-voting member <replica-set-non-voting-members>`.
174
224
@@ -218,7 +268,6 @@ aware of the following conditions and possible situations:
218
268
:ref:`replica-set-node-priority-configuration`, and
219
269
:data:`replica configuration <members[n].votes>`.
220
270
221
-
222
271
Elections and Network Partitions
223
272
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
224
273
@@ -228,7 +277,7 @@ a majority of the set is not up or reachable, no member will be elected
228
277
primary.
229
278
230
279
There is no way to tell (from the set's point of view) the difference
231
- between a network partition and nodes going down, so members left in a
280
+ between a network partition and members going down, so members left in a
232
281
minority will not attempt to become primary (to prevent a set from
233
282
ending up with primaries on either side of a partition).
234
283
0 commit comments