Skip to content

Commit 966309d

Browse files
committed
Adjust resiliency docs for searchable snapshots (#67630)
Today we recommend every index to have at least one replica in our guidelines for designing a resilient cluster. This advice does not apply to searchable snapshot indices. This commit adjusts the resiliency docs to account for this. It also slightly adjusts the wording in the searchable snapshots docs to be more consistent about the distinction between a "searchable snapshot" and a "searchable snapshot index".
1 parent c6ed6cc commit 966309d

File tree

2 files changed

+44
-39
lines changed

2 files changed

+44
-39
lines changed

docs/reference/high-availability/cluster-design.asciidoc

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,17 @@ operating normally if some of its nodes are unavailable or disconnected.
99
There is a limit to how small a resilient cluster can be. All {es} clusters
1010
require:
1111

12-
* One <<modules-discovery-quorums,elected master node>> node
13-
* At least one node for each <<modules-node,role>>.
14-
* At least one copy of every <<scalability,shard>>.
12+
- One <<modules-discovery-quorums,elected master node>> node
13+
- At least one node for each <<modules-node,role>>.
14+
- At least one copy of every <<scalability,shard>>.
1515

1616
A resilient cluster requires redundancy for every required cluster component.
1717
This means a resilient cluster must have:
1818

19-
* At least three master-eligible nodes
20-
* At least two nodes of each role
21-
* At least two copies of each shard (one primary and one or more replicas)
19+
- At least three master-eligible nodes
20+
- At least two nodes of each role
21+
- At least two copies of each shard (one primary and one or more replicas,
22+
unless the index is a <<searchable-snapshots,searchable snapshot index>>)
2223

2324
A resilient cluster needs three master-eligible nodes so that if one of
2425
them fails then the remaining two still form a majority and can hold a
@@ -44,8 +45,8 @@ failures. Designers of larger clusters must also consider cases where multiple
4445
nodes fail at the same time. The following pages give some recommendations for
4546
building resilient clusters of various sizes:
4647

47-
* <<high-availability-cluster-small-clusters>>
48-
* <<high-availability-cluster-design-large-clusters>>
48+
- <<high-availability-cluster-small-clusters>>
49+
- <<high-availability-cluster-design-large-clusters>>
4950

5051
[[high-availability-cluster-small-clusters]]
5152
=== Resilience in small clusters
@@ -78,11 +79,12 @@ one-node clusters in production.
7879

7980
If you have two nodes, we recommend they both be data nodes. You should also
8081
ensure every shard is stored redundantly on both nodes by setting
81-
<<dynamic-index-settings,`index.number_of_replicas`>> to `1` on every index.
82-
This is the default number of replicas but may be overridden by an
83-
<<index-templates,index template>>. <<dynamic-index-settings,Auto-expand
84-
replicas>> can also achieve the same thing, but it's not necessary to use this
85-
feature in such a small cluster.
82+
<<dynamic-index-settings,`index.number_of_replicas`>> to `1` on every index
83+
that is not a <<searchable-snapshots,searchable snapshot index>>. This is the
84+
default behaviour but may be overridden by an <<index-templates,index
85+
template>>. <<dynamic-index-settings,Auto-expand replicas>> can also achieve
86+
the same thing, but it's not necessary to use this feature in such a small
87+
cluster.
8688

8789
We recommend you set `node.master: false` on one of your two nodes so that it is
8890
not <<master-node,master-eligible>>. This means you can be certain which of your
@@ -162,12 +164,13 @@ cluster that is suitable for production deployments.
162164
[[high-availability-cluster-design-three-nodes]]
163165
==== Three-node clusters
164166

165-
If you have three nodes, we recommend they all be <<data-node,data
166-
nodes>> and every index should have at least one replica. Nodes are data nodes
167-
by default. You may prefer for some indices to have two replicas so that each
168-
node has a copy of each shard in those indices. You should also configure each
169-
node to be <<master-node,master-eligible>> so that any two of them can hold a
170-
master election without needing to communicate with the third node. Nodes are
167+
If you have three nodes, we recommend they all be <<data-node,data nodes>> and
168+
every index that is not a <<searchable-snapshots,searchable snapshot index>>
169+
should have at least one replica. Nodes are data nodes by default. You may
170+
prefer for some indices to have two replicas so that each node has a copy of
171+
each shard in those indices. You should also configure each node to be
172+
<<master-node,master-eligible>> so that any two of them can hold a master
173+
election without needing to communicate with the third node. Nodes are
171174
master-eligible by default. This cluster will be resilient to the loss of any
172175
single node.
173176

@@ -215,8 +218,8 @@ The cluster will be resilient to the loss of any node as long as:
215218

216219
- The <<cluster-health,cluster health status>> is `green`.
217220
- There are at least two data nodes.
218-
- Every index has at least one replica of each shard, in addition to the
219-
primary.
221+
- Every index that is not a <<searchable-snapshots,searchable snapshot index>>
222+
has at least one replica of each shard, in addition to the primary.
220223
- The cluster has at least three master-eligible nodes, as long as at least two
221224
of these nodes are not voting-only master-eligible nodes.
222225
- Clients are configured to send their requests to more than one node or are
@@ -326,14 +329,14 @@ zone. If you have more than three zones then you should choose three of the
326329
zones and put a master-eligible node in each of these three zones. This will
327330
mean that the cluster can still elect a master even if one of the zones fails.
328331

329-
As always, your indices should have at least one replica in case a node fails.
330-
You should also use <<allocation-awareness,shard allocation awareness>> to
331-
limit the number of copies of each shard in each zone. For instance, if you have
332-
an index with one or two replicas configured then allocation awareness will
333-
ensure that the replicas of the shard are in a different zone from the primary.
334-
This means that a copy of every shard will still be available if one zone
335-
fails. The availability of this shard will not be affected by such a
336-
failure.
332+
As always, your indices should have at least one replica in case a node fails,
333+
unless they are <<searchable-snapshots,searchable snapshot indices>>. You
334+
should also use <<allocation-awareness,shard allocation awareness>> to limit
335+
the number of copies of each shard in each zone. For instance, if you have an
336+
index with one or two replicas configured then allocation awareness will ensure
337+
that the replicas of the shard are in a different zone from the primary. This
338+
means that a copy of every shard will still be available if one zone fails. The
339+
availability of this shard will not be affected by such a failure.
337340

338341
[[high-availability-cluster-design-large-cluster-summary]]
339342
==== Summary
@@ -342,8 +345,8 @@ The cluster will be resilient to the loss of any zone as long as:
342345

343346
- The <<cluster-health,cluster health status>> is `green`.
344347
- There are at least two zones containing data nodes.
345-
- Every index has at least one replica of each shard, in addition to the
346-
primary.
348+
- Every index that is not a <<searchable-snapshots,searchable snapshot index>>
349+
has at least one replica of each shard, in addition to the primary.
347350
- Shard allocation awareness is configured to avoid concentrating all copies of
348351
a shard within a single zone.
349352
- The cluster has at least three master-eligible nodes. At least two of these

docs/reference/searchable-snapshots/index.asciidoc

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,20 +36,22 @@ index setting.
3636
If a node fails and {search-snap} shards need to be restored from the snapshot,
3737
there is a brief window of time while {es} allocates the shards to other nodes
3838
where the cluster health will not be `green`. Searches that hit these shards
39-
will fail or return partial results until they are reallocated.
39+
will fail or return partial results until the shards are reallocated to healthy
40+
nodes.
4041

4142
You typically manage {search-snaps} through {ilm-init}. The
4243
<<ilm-searchable-snapshot, searchable snapshots>> action automatically converts
43-
an index to a {search-snap} when it reaches the `cold` phase. You can also make
44-
indices in existing snapshots searchable by manually mounting them as
45-
{search-snaps} with the <<searchable-snapshots-api-mount-snapshot, mount
46-
snapshot>> API.
44+
a regular index into a {search-snap} index when it reaches the `cold` phase.
45+
You can also make indices in existing snapshots searchable by manually mounting
46+
them as {search-snap} indices with the
47+
<<searchable-snapshots-api-mount-snapshot, mount snapshot>> API.
4748

4849
To mount an index from a snapshot that contains multiple indices, we recommend
4950
creating a <<clone-snapshot-api, clone>> of the snapshot that contains only the
50-
index you want to search, and mounting the clone. You cannot delete a snapshot
51-
if it has any mounted indices, so creating a clone enables you to manage the
52-
lifecycle of the backup snapshot independently of any {search-snaps}.
51+
index you want to search, and mounting the clone. You should not delete a
52+
snapshot if it has any mounted indices, so creating a clone enables you to
53+
manage the lifecycle of the backup snapshot independently of any
54+
{search-snaps}.
5355

5456
You can control the allocation of the shards of {search-snap} indices using the
5557
same mechanisms as for regular indices. For example, you could use

0 commit comments

Comments
 (0)