@@ -9,16 +9,17 @@ operating normally if some of its nodes are unavailable or disconnected.
99There is a limit to how small a resilient cluster can be. All {es} clusters
1010require:
1111
12- * One <<modules-discovery-quorums,elected master node>> node
13- * At least one node for each <<modules-node,role>>.
14- * At least one copy of every <<scalability,shard>>.
12+ - One <<modules-discovery-quorums,elected master node>> node
13+ - At least one node for each <<modules-node,role>>.
14+ - At least one copy of every <<scalability,shard>>.
1515
1616A resilient cluster requires redundancy for every required cluster component.
1717This means a resilient cluster must have:
1818
19- * At least three master-eligible nodes
20- * At least two nodes of each role
21- * At least two copies of each shard (one primary and one or more replicas)
19+ - At least three master-eligible nodes
20+ - At least two nodes of each role
21+ - At least two copies of each shard (one primary and one or more replicas,
22+ unless the index is a <<searchable-snapshots,searchable snapshot index>>)
2223
2324A resilient cluster needs three master-eligible nodes so that if one of
2425them fails then the remaining two still form a majority and can hold a
@@ -44,8 +45,8 @@ failures. Designers of larger clusters must also consider cases where multiple
4445nodes fail at the same time. The following pages give some recommendations for
4546building resilient clusters of various sizes:
4647
47- * <<high-availability-cluster-small-clusters>>
48- * <<high-availability-cluster-design-large-clusters>>
48+ - <<high-availability-cluster-small-clusters>>
49+ - <<high-availability-cluster-design-large-clusters>>
4950
5051[[high-availability-cluster-small-clusters]]
5152=== Resilience in small clusters
@@ -78,11 +79,12 @@ one-node clusters in production.
7879
7980If you have two nodes, we recommend they both be data nodes. You should also
8081ensure every shard is stored redundantly on both nodes by setting
81- <<dynamic-index-settings,`index.number_of_replicas`>> to `1` on every index.
82- This is the default number of replicas but may be overridden by an
83- <<index-templates,index template>>. <<dynamic-index-settings,Auto-expand
84- replicas>> can also achieve the same thing, but it's not necessary to use this
85- feature in such a small cluster.
82+ <<dynamic-index-settings,`index.number_of_replicas`>> to `1` on every index
83+ that is not a <<searchable-snapshots,searchable snapshot index>>. This is the
84+ default behaviour but may be overridden by an <<index-templates,index
85+ template>>. <<dynamic-index-settings,Auto-expand replicas>> can also achieve
86+ the same thing, but it's not necessary to use this feature in such a small
87+ cluster.
8688
8789We recommend you set `node.master: false` on one of your two nodes so that it is
8890not <<master-node,master-eligible>>. This means you can be certain which of your
@@ -162,12 +164,13 @@ cluster that is suitable for production deployments.
162164[[high-availability-cluster-design-three-nodes]]
163165==== Three-node clusters
164166
165- If you have three nodes, we recommend they all be <<data-node,data
166- nodes>> and every index should have at least one replica. Nodes are data nodes
167- by default. You may prefer for some indices to have two replicas so that each
168- node has a copy of each shard in those indices. You should also configure each
169- node to be <<master-node,master-eligible>> so that any two of them can hold a
170- master election without needing to communicate with the third node. Nodes are
167+ If you have three nodes, we recommend they all be <<data-node,data nodes>> and
168+ every index that is not a <<searchable-snapshots,searchable snapshot index>>
169+ should have at least one replica. Nodes are data nodes by default. You may
170+ prefer for some indices to have two replicas so that each node has a copy of
171+ each shard in those indices. You should also configure each node to be
172+ <<master-node,master-eligible>> so that any two of them can hold a master
173+ election without needing to communicate with the third node. Nodes are
171174master-eligible by default. This cluster will be resilient to the loss of any
172175single node.
173176
@@ -215,8 +218,8 @@ The cluster will be resilient to the loss of any node as long as:
215218
216219- The <<cluster-health,cluster health status>> is `green`.
217220- There are at least two data nodes.
218- - Every index has at least one replica of each shard, in addition to the
219- primary.
221+ - Every index that is not a <<searchable-snapshots,searchable snapshot index>>
222+ has at least one replica of each shard, in addition to the primary.
220223- The cluster has at least three master-eligible nodes, as long as at least two
221224 of these nodes are not voting-only master-eligible nodes.
222225- Clients are configured to send their requests to more than one node or are
@@ -326,14 +329,14 @@ zone. If you have more than three zones then you should choose three of the
326329zones and put a master-eligible node in each of these three zones. This will
327330mean that the cluster can still elect a master even if one of the zones fails.
328331
329- As always, your indices should have at least one replica in case a node fails.
330- You should also use <<allocation-awareness,shard allocation awareness>> to
331- limit the number of copies of each shard in each zone. For instance, if you have
332- an index with one or two replicas configured then allocation awareness will
333- ensure that the replicas of the shard are in a different zone from the primary.
334- This means that a copy of every shard will still be available if one zone
335- fails. The availability of this shard will not be affected by such a
336- failure.
332+ As always, your indices should have at least one replica in case a node fails,
333+ unless they are <<searchable-snapshots,searchable snapshot indices>>. You
334+ should also use <<allocation-awareness, shard allocation awareness>> to limit
335+ the number of copies of each shard in each zone. For instance, if you have an
336+ index with one or two replicas configured then allocation awareness will ensure
337+ that the replicas of the shard are in a different zone from the primary. This
338+ means that a copy of every shard will still be available if one zone fails. The
339+ availability of this shard will not be affected by such a failure.
337340
338341[[high-availability-cluster-design-large-cluster-summary]]
339342==== Summary
@@ -342,8 +345,8 @@ The cluster will be resilient to the loss of any zone as long as:
342345
343346- The <<cluster-health,cluster health status>> is `green`.
344347- There are at least two zones containing data nodes.
345- - Every index has at least one replica of each shard, in addition to the
346- primary.
348+ - Every index that is not a <<searchable-snapshots,searchable snapshot index>>
349+ has at least one replica of each shard, in addition to the primary.
347350- Shard allocation awareness is configured to avoid concentrating all copies of
348351 a shard within a single zone.
349352- The cluster has at least three master-eligible nodes. At least two of these
0 commit comments