Skip to content

Commit 33e9cf3

Browse files
authored
[DOCS] Merges list of discovery and cluster formation settings (#36909)
1 parent c8a8391 commit 33e9cf3

File tree

10 files changed

+217
-195
lines changed

10 files changed

+217
-195
lines changed

docs/reference/modules/discovery.asciidoc

Lines changed: 7 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -40,22 +40,15 @@ module. This module is divided into the following sections:
4040
Cluster state publishing is the process by which the elected master node
4141
updates the cluster state on all the other nodes in the cluster.
4242

43-
<<no-master-block>>::
44-
45-
The no-master block is put in place when there is no known elected master,
46-
and can be configured to determine which operations should be rejected when
47-
it is in place.
48-
49-
Advanced settings::
50-
51-
There are settings that allow advanced users to influence the
52-
<<master-election-settings,master election>> and
53-
<<fault-detection-settings,fault detection>> processes.
54-
5543
<<modules-discovery-quorums>>::
5644

5745
This section describes the detailed design behind the master election and
5846
auto-reconfiguration logic.
47+
48+
<<modules-discovery-settings,Settings>>::
49+
50+
There are settings that enable users to influence the discovery, cluster
51+
formation, master election and fault detection processes.
5952

6053
include::discovery/discovery.asciidoc[]
6154

@@ -65,11 +58,8 @@ include::discovery/adding-removing-nodes.asciidoc[]
6558

6659
include::discovery/publishing.asciidoc[]
6760

68-
include::discovery/no-master-block.asciidoc[]
69-
70-
include::discovery/master-election.asciidoc[]
61+
include::discovery/quorums.asciidoc[]
7162

7263
include::discovery/fault-detection.asciidoc[]
7364

74-
include::discovery/quorums.asciidoc[]
75-
65+
include::discovery/discovery-settings.asciidoc[]

docs/reference/modules/discovery/adding-removing-nodes.asciidoc

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,15 @@ desirable to add or remove some master-eligible nodes to or from a cluster.
1414

1515
==== Adding master-eligible nodes
1616

17-
If you wish to add some master-eligible nodes to your cluster, simply configure
18-
the new nodes to find the existing cluster and start them up. Elasticsearch will
19-
add the new nodes to the voting configuration if it is appropriate to do so.
17+
If you wish to add some nodes to your cluster, simply configure the new nodes
18+
to find the existing cluster and start them up. Elasticsearch adds the new nodes
19+
to the voting configuration if it is appropriate to do so.
20+
21+
During master election or when joining an existing formed cluster, a node
22+
sends a join request to the master in order to be officially added to the
23+
cluster. You can use the `cluster.join.timeout` setting to configure how long a
24+
node waits after sending a request to join a cluster. Its default value is `30s`.
25+
See <<modules-discovery-settings>>.
2026

2127
==== Removing master-eligible nodes
2228

@@ -93,18 +99,13 @@ GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_excl
9399
--------------------------------------------------
94100
// CONSOLE
95101

96-
This list is limited in size by the following setting:
97-
98-
`cluster.max_voting_config_exclusions`::
99-
100-
Sets a limits on the number of voting configuration exclusions at any one
101-
time. Defaults to `10`.
102-
103-
Since voting configuration exclusions are persistent and limited in number, they
104-
must be cleaned up. Normally an exclusion is added when performing some
105-
maintenance on the cluster, and the exclusions should be cleaned up when the
106-
maintenance is complete. Clusters should have no voting configuration exclusions
107-
in normal operation.
102+
This list is limited in size by the `cluster.max_voting_config_exclusions`
103+
setting, which defaults to `10`. See <<modules-discovery-settings>>. Since
104+
voting configuration exclusions are persistent and limited in number, they must
105+
be cleaned up. Normally an exclusion is added when performing some maintenance
106+
on the cluster, and the exclusions should be cleaned up when the maintenance is
107+
complete. Clusters should have no voting configuration exclusions in normal
108+
operation.
108109

109110
If a node is excluded from the voting configuration because it is to be shut
110111
down permanently, its exclusion can be removed after it is shut down and removed

docs/reference/modules/discovery/bootstrapping.asciidoc

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,13 @@ more of the master-eligible nodes in the cluster. This is known as _cluster
77
bootstrapping_. This is only required the very first time the cluster starts
88
up: nodes that have already joined a cluster store this information in their
99
data folder and freshly-started nodes that are joining an existing cluster
10-
obtain this information from the cluster's elected master. This information is
11-
given using this setting:
10+
obtain this information from the cluster's elected master.
1211

13-
`cluster.initial_master_nodes`::
14-
15-
Sets a list of the <<node.name,node names>> or transport addresses of the
16-
initial set of master-eligible nodes in a brand-new cluster. By default
17-
this list is empty, meaning that this node expects to join a cluster that
18-
has already been bootstrapped.
19-
20-
This setting can be given on the command line or in the `elasticsearch.yml`
21-
configuration file when starting up a master-eligible node. Once the cluster
22-
has formed this setting is no longer required and is ignored. It need not be set
12+
The initial set of master-eligible nodes is defined in the
13+
<<initial_master_nodes,`cluster.initial_master_nodes` setting>>. When you
14+
start a master-eligible node, you can provide this setting on the command line
15+
or in the `elasticsearch.yml` file. After the cluster has formed, this setting
16+
is no longer required and is ignored. It need not be set
2317
on master-ineligible nodes, nor on master-eligible nodes that are started to
2418
join an existing cluster. Note that master-eligible nodes should use storage
2519
that persists across restarts. If they do not, and
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
[[modules-discovery-settings]]
2+
=== Discovery and cluster formation settings
3+
4+
Discovery and cluster formation are affected by the following settings:
5+
6+
[[master-election-settings]]`cluster.election.back_off_time`::
7+
8+
Sets the amount to increase the upper bound on the wait before an election
9+
on each election failure. Note that this is _linear_ backoff. This defaults
10+
to `100ms`
11+
12+
`cluster.election.duration`::
13+
14+
Sets how long each election is allowed to take before a node considers it to
15+
have failed and schedules a retry. This defaults to `500ms`.
16+
17+
`cluster.election.initial_timeout`::
18+
19+
Sets the upper bound on how long a node will wait initially, or after the
20+
elected master fails, before attempting its first election. This defaults
21+
to `100ms`.
22+
23+
24+
`cluster.election.max_timeout`::
25+
26+
Sets the maximum upper bound on how long a node will wait before attempting
27+
an first election, so that an network partition that lasts for a long time
28+
does not result in excessively sparse elections. This defaults to `10s`
29+
30+
[[fault-detection-settings]]`cluster.fault_detection.follower_check.interval`::
31+
32+
Sets how long the elected master waits between follower checks to each
33+
other node in the cluster. Defaults to `1s`.
34+
35+
`cluster.fault_detection.follower_check.timeout`::
36+
37+
Sets how long the elected master waits for a response to a follower check
38+
before considering it to have failed. Defaults to `30s`.
39+
40+
`cluster.fault_detection.follower_check.retry_count`::
41+
42+
Sets how many consecutive follower check failures must occur to each node
43+
before the elected master considers that node to be faulty and removes it
44+
from the cluster. Defaults to `3`.
45+
46+
`cluster.fault_detection.leader_check.interval`::
47+
48+
Sets how long each node waits between checks of the elected master.
49+
Defaults to `1s`.
50+
51+
`cluster.fault_detection.leader_check.timeout`::
52+
53+
Sets how long each node waits for a response to a leader check from the
54+
elected master before considering it to have failed. Defaults to `30s`.
55+
56+
`cluster.fault_detection.leader_check.retry_count`::
57+
58+
Sets how many consecutive leader check failures must occur before a node
59+
considers the elected master to be faulty and attempts to find or elect a
60+
new master. Defaults to `3`.
61+
62+
`cluster.follower_lag.timeout`::
63+
64+
Sets how long the master node waits to receive acknowledgements for cluster
65+
state updates from lagging nodes. The default value is `90s`. If a node does
66+
not successfully apply the cluster state update within this period of time,
67+
it is considered to have failed and is removed from the cluster. See
68+
<<cluster-state-publishing>>.
69+
70+
`cluster.initial_master_nodes`::
71+
72+
Sets a list of the <<node.name,node names>> or transport addresses of the
73+
initial set of master-eligible nodes in a brand-new cluster. By default
74+
this list is empty, meaning that this node expects to join a cluster that
75+
has already been bootstrapped. See <<initial_master_nodes>>.
76+
77+
`cluster.join.timeout`::
78+
79+
Sets how long a node will wait after sending a request to join a cluster
80+
before it considers the request to have failed and retries. Defaults to
81+
`60s`.
82+
83+
`cluster.max_voting_config_exclusions`::
84+
85+
Sets a limit on the number of voting configuration exclusions at any one
86+
time. The default value is `10`. See
87+
<<modules-discovery-adding-removing-nodes>>.
88+
89+
`cluster.publish.timeout`::
90+
91+
Sets how long the master node waits for each cluster state update to be
92+
completely published to all nodes. The default value is `30s`. If this
93+
period of time elapses, the cluster state change is rejected. See
94+
<<cluster-state-publishing>>.
95+
96+
`discovery.cluster_formation_warning_timeout`::
97+
98+
Sets how long a node will try to form a cluster before logging a warning
99+
that the cluster did not form. Defaults to `10s`. If a cluster has not
100+
formed after `discovery.cluster_formation_warning_timeout` has elapsed then
101+
the node will log a warning message that starts with the phrase `master not discovered` which describes the current state of the discovery process.
102+
103+
`discovery.find_peers_interval`::
104+
105+
Sets how long a node will wait before attempting another discovery round.
106+
Defaults to `1s`.
107+
108+
`discovery.probe.connect_timeout`::
109+
110+
Sets how long to wait when attempting to connect to each address. Defaults
111+
to `3s`.
112+
113+
`discovery.probe.handshake_timeout`::
114+
115+
Sets how long to wait when attempting to identify the remote node via a
116+
handshake. Defaults to `1s`.
117+
118+
`discovery.request_peers_timeout`::
119+
Sets how long a node will wait after asking its peers again before
120+
considering the request to have failed. Defaults to `3s`.
121+
122+
`discovery.zen.hosts_provider`::
123+
Specifies which type of <<built-in-hosts-providers,hosts provider>> provides
124+
the list of seed nodes. By default, it is the
125+
<<settings-based-hosts-provider,settings-based hosts provider>>.
126+
127+
[[no-master-block]]`discovery.zen.no_master_block`::
128+
Specifies which operations are rejected when there is no active master in a
129+
cluster. This setting has two valid values:
130+
+
131+
--
132+
`all`::: All operations on the node (both read and write operations) are rejected.
133+
This also applies for API cluster state read or write operations, like the get
134+
index settings, put mapping and cluster state API.
135+
136+
`write`::: (default) Write operations are rejected. Read operations succeed,
137+
based on the last known cluster configuration. This situation may result in
138+
partial reads of stale data as this node may be isolated from the rest of the
139+
cluster.
140+
141+
[NOTE]
142+
===============================
143+
* The `discovery.zen.no_master_block` setting doesn't apply to nodes-based APIs
144+
(for example, cluster stats, node info, and node stats APIs). Requests to these
145+
APIs are not be blocked and can run on any available node.
146+
147+
* For the cluster to be fully operational, it must have an active master.
148+
===============================
149+
--
150+
151+
`discovery.zen.ping.unicast.hosts`::
152+
153+
Provides a list of master-eligible nodes in the cluster. The list contains
154+
either an array of hosts or a comma-delimited string. Each value has the
155+
format `host:port` or `host`, where `port` defaults to the setting `transport.profiles.default.port`. Note that IPv6 hosts must be bracketed.
156+
The default value is `127.0.0.1, [::1]`. See <<unicast.hosts>>.
157+
158+
`discovery.zen.ping.unicast.hosts.resolve_timeout`::
159+
160+
Sets the amount of time to wait for DNS lookups on each round of discovery. This is specified as a <<time-units, time unit>> and defaults to `5s`.

docs/reference/modules/discovery/discovery.asciidoc

Lines changed: 3 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -82,9 +82,10 @@ gives a convenient mechanism for an Elasticsearch instance that is run in a
8282
Docker container to be dynamically supplied with a list of IP addresses to
8383
connect to when those IP addresses may not be known at node startup.
8484

85-
To enable file-based discovery, configure the `file` hosts provider as follows:
85+
To enable file-based discovery, configure the `file` hosts provider as follows
86+
in the `elasticsearch.yml` file:
8687

87-
[source,txt]
88+
[source,yml]
8889
----------------------------------------------------------------
8990
discovery.zen.hosts_provider: file
9091
----------------------------------------------------------------
@@ -150,39 +151,3 @@ a hosts provider that uses the Azure Classic API find a list of seed nodes.
150151

151152
The {plugins}/discovery-gce.html[GCE discovery plugin] adds a hosts provider
152153
that uses the GCE API find a list of seed nodes.
153-
154-
[float]
155-
==== Discovery settings
156-
157-
The discovery process is controlled by the following settings.
158-
159-
`discovery.find_peers_interval`::
160-
161-
Sets how long a node will wait before attempting another discovery round.
162-
Defaults to `1s`.
163-
164-
`discovery.request_peers_timeout`::
165-
166-
Sets how long a node will wait after asking its peers again before
167-
considering the request to have failed. Defaults to `3s`.
168-
169-
`discovery.probe.connect_timeout`::
170-
171-
Sets how long to wait when attempting to connect to each address. Defaults
172-
to `3s`.
173-
174-
`discovery.probe.handshake_timeout`::
175-
176-
Sets how long to wait when attempting to identify the remote node via a
177-
handshake. Defaults to `1s`.
178-
179-
`discovery.cluster_formation_warning_timeout`::
180-
181-
Sets how long a node will try to form a cluster before logging a warning
182-
that the cluster did not form. Defaults to `10s`.
183-
184-
If a cluster has not formed after `discovery.cluster_formation_warning_timeout`
185-
has elapsed then the node will log a warning message that starts with the phrase
186-
`master not discovered` which describes the current state of the discovery
187-
process.
188-
Lines changed: 16 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,19 @@
1-
[[fault-detection-settings]]
2-
=== Cluster fault detection settings
1+
[[cluster-fault-detection]]
2+
=== Cluster fault detection
33

4-
An elected master periodically checks each of the nodes in the cluster in order
5-
to ensure that they are still connected and healthy, and in turn each node in
6-
the cluster periodically checks the health of the elected master. These checks
4+
The elected master periodically checks each of the nodes in the cluster to
5+
ensure that they are still connected and healthy. Each node in the cluster also periodically checks the health of the elected master. These checks
76
are known respectively as _follower checks_ and _leader checks_.
87

9-
Elasticsearch allows for these checks occasionally to fail or timeout without
10-
taking any action, and will only consider a node to be truly faulty after a
11-
number of consecutive checks have failed. The following settings control the
12-
behaviour of fault detection.
13-
14-
`cluster.fault_detection.follower_check.interval`::
15-
16-
Sets how long the elected master waits between follower checks to each
17-
other node in the cluster. Defaults to `1s`.
18-
19-
`cluster.fault_detection.follower_check.timeout`::
20-
21-
Sets how long the elected master waits for a response to a follower check
22-
before considering it to have failed. Defaults to `30s`.
23-
24-
`cluster.fault_detection.follower_check.retry_count`::
25-
26-
Sets how many consecutive follower check failures must occur to each node
27-
before the elected master considers that node to be faulty and removes it
28-
from the cluster. Defaults to `3`.
29-
30-
`cluster.fault_detection.leader_check.interval`::
31-
32-
Sets how long each node waits between checks of the elected master.
33-
Defaults to `1s`.
34-
35-
`cluster.fault_detection.leader_check.timeout`::
36-
37-
Sets how long each node waits for a response to a leader check from the
38-
elected master before considering it to have failed. Defaults to `30s`.
39-
40-
`cluster.fault_detection.leader_check.retry_count`::
41-
42-
Sets how many consecutive leader check failures must occur before a node
43-
considers the elected master to be faulty and attempts to find or elect a
44-
new master. Defaults to `3`.
45-
46-
If the elected master detects that a node has disconnected then this is treated
47-
as an immediate failure, bypassing the timeouts and retries listed above, and
48-
the master attempts to remove the node from the cluster. Similarly, if a node
49-
detects that the elected master has disconnected then this is treated as an
50-
immediate failure, bypassing the timeouts and retries listed above, and the
51-
follower restarts its discovery phase to try and find or elect a new master.
52-
8+
Elasticsearch allows these checks to occasionally fail or timeout without
9+
taking any action. It considers a node to be faulty only after a number of
10+
consecutive checks have failed. You can control fault detection behavior with
11+
<<modules-discovery-settings,`cluster.fault_detection.*` settings>>.
12+
13+
If the elected master detects that a node has disconnected, however, this
14+
situation is treated as an immediate failure. The master bypasses the timeout
15+
and retry setting values and attempts to remove the node from the cluster.
16+
Similarly, if a node detects that the elected master has disconnected, this
17+
situation is treated as an immediate failure. The node bypasses the timeout and
18+
retry settings and restarts its discovery phase to try and find or elect a new
19+
master.

0 commit comments

Comments
 (0)