11[[modules-discovery-zen]]
22=== Zen Discovery
33
4- The zen discovery is the built in discovery module for Elasticsearch and
5- the default. It provides unicast discovery, but can be extended to
6- support cloud environments and other forms of discovery.
4+ Zen discovery is the built-in, default, discovery module for Elasticsearch. It
5+ provides unicast and file-based discovery, and can be extended to support cloud
6+ environments and other forms of discovery via plugins .
77
8- The zen discovery is integrated with other modules, for example, all
9- communication between nodes is done using the
10- <<modules-transport,transport>> module.
8+ Zen discovery is integrated with other modules, for example, all communication
9+ between nodes is done using the <<modules-transport,transport>> module.
1110
1211It is separated into several sub modules, which are explained below:
1312
1413[float]
1514[[ping]]
1615==== Ping
1716
18- This is the process where a node uses the discovery mechanisms to find
19- other nodes.
17+ This is the process where a node uses the discovery mechanisms to find other
18+ nodes.
19+
20+ [float]
21+ [[discovery-seed-nodes]]
22+ ==== Seed nodes
23+
24+ Zen discovery uses a list of _seed_ nodes in order to start off the discovery
25+ process. At startup, or when electing a new master, Elasticsearch tries to
26+ connect to each seed node in its list, and holds a gossip-like conversation with
27+ them to find other nodes and to build a complete picture of the cluster. By
28+ default there are two methods for configuring the list of seed nodes: _unicast_
29+ and _file-based_. It is recommended that the list of seed nodes comprises the
30+ list of master-eligible nodes in the cluster.
2031
2132[float]
2233[[unicast]]
2334===== Unicast
2435
25- Unicast discovery requires a list of hosts to use that will act as gossip
26- routers. These hosts can be specified as hostnames or IP addresses; hosts
27- specified as hostnames are resolved to IP addresses during each round of
28- pinging. Note that if you are in an environment where DNS resolutions vary with
29- time, you might need to adjust your <<networkaddress-cache-ttl,JVM security
30- settings>>.
36+ Unicast discovery configures a static list of hosts for use as seed nodes.
37+ These hosts can be specified as hostnames or IP addresses; hosts specified as
38+ hostnames are resolved to IP addresses during each round of pinging. Note that
39+ if you are in an environment where DNS resolutions vary with time, you might
40+ need to adjust your <<networkaddress-cache-ttl,JVM security settings>>.
3141
32- It is recommended that the unicast hosts list be maintained as the list of
33- master-eligible nodes in the cluster.
42+ The list of hosts is set using the `discovery.zen.ping.unicast.hosts` static
43+ setting. This is either an array of hosts or a comma-delimited string. Each
44+ value should be in the form of `host:port` or `host` (where `port` defaults to
45+ the setting `transport.profiles.default.port` falling back to
46+ `transport.tcp.port` if not set). Note that IPv6 hosts must be bracketed. The
47+ default for this setting is `127.0.0.1, [::1]`
3448
35- Unicast discovery provides the following settings with the `discovery.zen.ping.unicast` prefix:
49+ Additionally, the `discovery.zen.ping.unicast.resolve_timeout` configures the
50+ amount of time to wait for DNS lookups on each round of pinging. This is
51+ specified as a <<time-units, time unit>> and defaults to 5s.
3652
37- [cols="<,<",options="header",]
38- |=======================================================================
39- |Setting |Description
40- |`hosts` |Either an array setting or a comma delimited setting. Each
41- value should be in the form of `host:port` or `host` (where `port` defaults to the setting `transport.profiles.default.port`
42- falling back to `transport.tcp.port` if not set). Note that IPv6 hosts must be bracketed. Defaults to `127.0.0.1, [::1]`
43- |`hosts.resolve_timeout` |The amount of time to wait for DNS lookups on each round of pinging. Specified as
44- <<time-units, time units>>. Defaults to 5s.
45- |=======================================================================
53+ Unicast discovery uses the <<modules-transport,transport>> module to perform the
54+ discovery.
4655
47- The unicast discovery uses the <<modules-transport,transport>> module to perform the discovery.
56+ [float]
57+ [[file-based-hosts-provider]]
58+ ===== File-based
59+
60+ In addition to hosts provided by the static `discovery.zen.ping.unicast.hosts`
61+ setting, it is possible to provide a list of hosts via an external file.
62+ Elasticsearch reloads this file when it changes, so that the list of seed nodes
63+ can change dynamically without needing to restart each node. For example, this
64+ gives a convenient mechanism for an Elasticsearch instance that is run in a
65+ Docker container to be dynamically supplied with a list of IP addresses to
66+ connect to for Zen discovery when those IP addresses may not be known at node
67+ startup.
68+
69+ To enable file-based discovery, configure the `file` hosts provider as follows:
70+
71+ ```
72+ discovery.zen.hosts_provider: file
73+ ```
74+
75+ Then create a file at `$ES_PATH_CONF/unicast_hosts.txt` in
76+ <<discovery-file-format,the format described below>>. Any time a change is made
77+ to the `unicast_hosts.txt` file the new changes will be picked up by
78+ Elasticsearch and the new hosts list will be used.
79+
80+ Note that the file-based discovery plugin augments the unicast hosts list in
81+ `elasticsearch.yml`: if there are valid unicast host entries in
82+ `discovery.zen.ping.unicast.hosts` then they will be used in addition to those
83+ supplied in `unicast_hosts.txt`.
84+
85+ The `discovery.zen.ping.unicast.resolve_timeout` setting also applies to DNS
86+ lookups for nodes specified by address via file-based discovery. This is
87+ specified as a <<time-units, time unit>> and defaults to 5s.
88+
89+ [[discovery-file-format]]
90+ [float]
91+ ====== unicast_hosts.txt file format
92+
93+ The format of the file is to specify one node entry per line. Each node entry
94+ consists of the host (host name or IP address) and an optional transport port
95+ number. If the port number is specified, is must come immediately after the
96+ host (on the same line) separated by a `:`. If the port number is not
97+ specified, a default value of 9300 is used.
98+
99+ For example, this is an example of `unicast_hosts.txt` for a cluster with four
100+ nodes that participate in unicast discovery, some of which are not running on
101+ the default port:
102+
103+ [source,txt]
104+ ----------------------------------------------------------------
105+ 10.10.10.5
106+ 10.10.10.6:9305
107+ 10.10.10.5:10005
108+ # an IPv6 address
109+ [2001:0db8:85a3:0000:0000:8a2e:0370:7334]:9301
110+ ----------------------------------------------------------------
111+
112+ Host names are allowed instead of IP addresses (similar to
113+ `discovery.zen.ping.unicast.hosts`), and IPv6 addresses must be specified in
114+ brackets with the port coming after the brackets.
115+
116+ It is also possible to add comments to this file. All comments must appear on
117+ their lines starting with `#` (i.e. comments cannot start in the middle of a
118+ line).
48119
49120[float]
50121[[master-election]]
51122==== Master Election
52123
53- As part of the ping process a master of the cluster is either
54- elected or joined to. This is done automatically. The
55- `discovery.zen.ping_timeout` (which defaults to `3s`) determines how long the node
56- will wait before deciding on starting an election or joining an existing cluster.
57- Three pings will be sent over this timeout interval. In case where no decision can be
58- reached after the timeout, the pinging process restarts.
59- In slow or congested networks, three seconds might not be enough for a node to become
60- aware of the other nodes in its environment before making an election decision.
61- Increasing the timeout should be done with care in that case, as it will slow down the
62- election process.
63- Once a node decides to join an existing formed cluster, it
64- will send a join request to the master (`discovery.zen.join_timeout`)
65- with a timeout defaulting at 20 times the ping timeout.
66-
67- When the master node stops or has encountered a problem, the cluster nodes
68- start pinging again and will elect a new master. This pinging round also
69- serves as a protection against (partial) network failures where a node may unjustly
70- think that the master has failed. In this case the node will simply hear from
71- other nodes about the currently active master.
72-
73- If `discovery.zen.master_election.ignore_non_master_pings` is `true`, pings from nodes that are not master
74- eligible (nodes where `node.master` is `false`) are ignored during master election; the default value is
124+ As part of the ping process a master of the cluster is either elected or joined
125+ to. This is done automatically. The `discovery.zen.ping_timeout` (which defaults
126+ to `3s`) determines how long the node will wait before deciding on starting an
127+ election or joining an existing cluster. Three pings will be sent over this
128+ timeout interval. In case where no decision can be reached after the timeout,
129+ the pinging process restarts. In slow or congested networks, three seconds
130+ might not be enough for a node to become aware of the other nodes in its
131+ environment before making an election decision. Increasing the timeout should
132+ be done with care in that case, as it will slow down the election process. Once
133+ a node decides to join an existing formed cluster, it will send a join request
134+ to the master (`discovery.zen.join_timeout`) with a timeout defaulting at 20
135+ times the ping timeout.
136+
137+ When the master node stops or has encountered a problem, the cluster nodes start
138+ pinging again and will elect a new master. This pinging round also serves as a
139+ protection against (partial) network failures where a node may unjustly think
140+ that the master has failed. In this case the node will simply hear from other
141+ nodes about the currently active master.
142+
143+ If `discovery.zen.master_election.ignore_non_master_pings` is `true`, pings from
144+ nodes that are not master eligible (nodes where `node.master` is `false`) are
145+ ignored during master election; the default value is `false`.
146+
147+ Nodes can be excluded from becoming a master by setting `node.master` to
75148`false`.
76149
77- Nodes can be excluded from becoming a master by setting `node.master` to `false`.
78-
79- The `discovery.zen.minimum_master_nodes` sets the minimum
80- number of master eligible nodes that need to join a newly elected master in order for an election to
81- complete and for the elected node to accept its mastership. The same setting controls the minimum number of
82- active master eligible nodes that should be a part of any active cluster. If this requirement is not met the
83- active master node will step down and a new master election will begin.
150+ The `discovery.zen.minimum_master_nodes` sets the minimum number of master
151+ eligible nodes that need to join a newly elected master in order for an election
152+ to complete and for the elected node to accept its mastership. The same setting
153+ controls the minimum number of active master eligible nodes that should be a
154+ part of any active cluster. If this requirement is not met the active master
155+ node will step down and a new master election will begin.
84156
85157This setting must be set to a <<minimum_master_nodes,quorum>> of your master
86158eligible nodes. It is recommended to avoid having only two master eligible
87- nodes, since a quorum of two is two. Therefore, a loss of either master
88- eligible node will result in an inoperable cluster.
159+ nodes, since a quorum of two is two. Therefore, a loss of either master eligible
160+ node will result in an inoperable cluster.
89161
90162[float]
91163[[fault-detection]]
92164==== Fault Detection
93165
94- There are two fault detection processes running. The first is by the
95- master, to ping all the other nodes in the cluster and verify that they
96- are alive. And on the other end, each node pings to master to verify if
97- its still alive or an election process needs to be initiated.
166+ There are two fault detection processes running. The first is by the master, to
167+ ping all the other nodes in the cluster and verify that they are alive. And on
168+ the other end, each node pings to master to verify if its still alive or an
169+ election process needs to be initiated.
98170
99171The following settings control the fault detection process using the
100172`discovery.zen.fd` prefix:
@@ -116,19 +188,21 @@ considered failed. Defaults to `3`.
116188
117189The master node is the only node in a cluster that can make changes to the
118190cluster state. The master node processes one cluster state update at a time,
119- applies the required changes and publishes the updated cluster state to all
120- the other nodes in the cluster. Each node receives the publish message, acknowledges
121- it, but does *not* yet apply it. If the master does not receive acknowledgement from
122- at least `discovery.zen.minimum_master_nodes` nodes within a certain time (controlled by
123- the `discovery.zen.commit_timeout` setting and defaults to 30 seconds) the cluster state
124- change is rejected.
125-
126- Once enough nodes have responded, the cluster state is committed and a message will
127- be sent to all the nodes. The nodes then proceed to apply the new cluster state to their
128- internal state. The master node waits for all nodes to respond, up to a timeout, before
129- going ahead processing the next updates in the queue. The `discovery.zen.publish_timeout` is
130- set by default to 30 seconds and is measured from the moment the publishing started. Both
131- timeout settings can be changed dynamically through the <<cluster-update-settings,cluster update settings api>>
191+ applies the required changes and publishes the updated cluster state to all the
192+ other nodes in the cluster. Each node receives the publish message, acknowledges
193+ it, but does *not* yet apply it. If the master does not receive acknowledgement
194+ from at least `discovery.zen.minimum_master_nodes` nodes within a certain time
195+ (controlled by the `discovery.zen.commit_timeout` setting and defaults to 30
196+ seconds) the cluster state change is rejected.
197+
198+ Once enough nodes have responded, the cluster state is committed and a message
199+ will be sent to all the nodes. The nodes then proceed to apply the new cluster
200+ state to their internal state. The master node waits for all nodes to respond,
201+ up to a timeout, before going ahead processing the next updates in the queue.
202+ The `discovery.zen.publish_timeout` is set by default to 30 seconds and is
203+ measured from the moment the publishing started. Both timeout settings can be
204+ changed dynamically through the <<cluster-update-settings,cluster update
205+ settings api>>
132206
133207[float]
134208[[no-master-block]]
@@ -143,10 +217,14 @@ rejected when there is no active master.
143217The `discovery.zen.no_master_block` setting has two valid options:
144218
145219[horizontal]
146- `all`:: All operations on the node--i.e. both read & writes--will be rejected. This also applies for api cluster state
147- read or write operations, like the get index settings, put mapping and cluster state api.
148- `write`:: (default) Write operations will be rejected. Read operations will succeed, based on the last known cluster configuration.
149- This may result in partial reads of stale data as this node may be isolated from the rest of the cluster.
150-
151- The `discovery.zen.no_master_block` setting doesn't apply to nodes-based apis (for example cluster stats, node info and
152- node stats apis). Requests to these apis will not be blocked and can run on any available node.
220+ `all`:: All operations on the node--i.e. both read & writes--will be rejected.
221+ This also applies for api cluster state read or write operations, like the get
222+ index settings, put mapping and cluster state api.
223+ `write`:: (default) Write operations will be rejected. Read operations will
224+ succeed, based on the last known cluster configuration. This may result in
225+ partial reads of stale data as this node may be isolated from the rest of the
226+ cluster.
227+
228+ The `discovery.zen.no_master_block` setting doesn't apply to nodes-based apis
229+ (for example cluster stats, node info and node stats apis). Requests to these
230+ apis will not be blocked and can run on any available node.
0 commit comments