Skip to content

Commit 22f8433

Browse files
andreidandakronedebadair
authored
DOCS: general overview of data tiers and roles (#63086) (#63421)
This adds general overview documentation for data tiers, the data tiers specific node roles, and their application in ILM. Co-authored-by: Lee Hinman <[email protected]> Co-authored-by: debadair <[email protected]> (cherry picked from commit d588cab) Signed-off-by: Andrei Dan <[email protected]> Co-authored-by: Lee Hinman <[email protected]> Co-authored-by: debadair <[email protected]>
1 parent b344d67 commit 22f8433

File tree

9 files changed

+326
-6
lines changed

9 files changed

+326
-6
lines changed

docs/reference/datatiers.asciidoc

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
[role="xpack"]
2+
[[data-tiers]]
3+
=== Data tiers
4+
5+
Common data lifecycle management patterns revolve around transitioning indices
6+
through multiple collections of nodes with different hardware characteristics in order
7+
to fulfil evolving CRUD, search, and aggregation needs as indices age. The concept
8+
of a tiered hardware architecture is not new in {es}.
9+
<<index-lifecycle-management, Index Lifecycle Management>> is instrumental in
10+
implementing tiered architectures by automating the managemnt of indices according to
11+
performance, resiliency and data retention requirements.
12+
<<overview-index-lifecycle-management, Hot/warm/cold>> architectures are common
13+
for timeseries data such as logging and metrics.
14+
15+
A data tier is a collection of nodes with the same role. Data tiers are an integrated
16+
solution offering better support for optimising cost and improving performance.
17+
Formalized data tiers in ES allow configuration of the lifecycle and location of data
18+
in a hot/warm/cold topology without requiring the use of custom node attributes.
19+
Each tier formalises specific characteristics and data behaviours.
20+
21+
The node roles that can currently define data tiers are:
22+
23+
* <<data-content-node, data_content>>
24+
* <<data-hot-node, data_hot>>
25+
* <<data-warm-node, data_warm>>
26+
* <<data-cold-node, data_cold>>
27+
28+
The more generic <<data-node, data role>> is not a data tier role, but
29+
it is the default node role if no roles are configured. If a node has the
30+
<<data-node, data>> role we treat the node as if it has all of the tier
31+
roles assigned.
32+
33+
[[content-tier]]
34+
==== Content tier
35+
36+
The content tier is made of one or more nodes that have the <<data-content-node, data_content>>
37+
role. A content tier is designed to store and search user created content. Non-timeseries data
38+
doesn't necessarily follow the hot-warm-cold path. The hardware profiles are quite different to
39+
the <<hot-tier, hot tier>>. User created content prioritises high CPU to support complex
40+
queries and aggregations in a timely manner, as opposed to the <<hot-tier, hot tier>> which
41+
prioritises high IO.
42+
The content data has very long data retention characteristics and from a resiliency perspective
43+
the indices in this tier should be configured to use one or more replicas.
44+
45+
NOTE: new indices that are not part of <<data-streams, data streams>> will be automatically allocated to the
46+
<<content-tier>>
47+
48+
[[hot-tier]]
49+
==== Hot tier
50+
51+
The hot tier is made of one or more nodes that have the <<data-hot-node, data_hot>> role.
52+
It is the {es} entry point for timeseries data. This tier needs to be fast both for reads
53+
and writes, requiring more hardware resources such as SSD drives. The hot tier is usually
54+
hosting the data from recent days. From a resiliency perspective the indices in this
55+
tier should be configured to use one or more replicas.
56+
57+
NOTE: new indices that are part of a <<data-streams, data stream>> will be automatically allocated to the
58+
<<hot-tier>>
59+
60+
[[warm-tier]]
61+
==== Warm tier
62+
63+
The warm tier is made of one or more nodes that have the <<data-warm-node, data_warm>> role.
64+
This tier is where data goes once it is not queried as frequently as in the <<hot-tier, hot tier>>.
65+
It is a medium-fast tier that still allows data updates. The warm tier is usually
66+
hosting the data from recent weeks. From a resiliency perspective the indices in this
67+
tier should be configured to use one or more replicas.
68+
69+
[[cold-tier]]
70+
==== Cold tier
71+
72+
The cold tier is made of one or more nodes that have the <<data-cold-node, data_cold>> role.
73+
Once the data in the <<warm-tier, warm tier>> is not updated anymore it can transition to the
74+
cold tier. The cold tier is still a responsive query tier but as the data transitions into this
75+
tier it can be compressed, shrunken, or configured to have zero replicas and be backed by
76+
a <<ilm-searchable-snapshot, snapshot>>. The cold tier is usually hosting the data from recent
77+
months or years.
78+
[discrete]
79+
[[data-tier-allocation]]
80+
=== Data tier index allocation
81+
82+
When an index is created {es} will automatically allocate the index to the <<content-tier, Content tier>>
83+
if the index is not part of a <<data-streams, data stream>> or to the <<hot-tier, Hot tier>> if the index
84+
is part of a <<data-streams, data stream>>.
85+
{es} will configure the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
86+
to `data_content` or `data_hot` respectively.
87+
88+
These heuristics can be overridden by specifying any <<shard-allocation-filtering, shard allocation filtering>>
89+
settings in the create index request or index template that matches the new index.
90+
Specifying any configuration, including `null`, for `index.routing.allocation.include._tier_preference` will
91+
also opt out of the automatic new index allocation to tiers.
92+
[discrete]
93+
[[data-tier-migration]]
94+
=== Data tier index migration
95+
96+
<<index-lifecycle-management, Index Lifecycle Management>> automates the transition of managed
97+
indices through the available data tiers using the `migrate` action which is injected
98+
in every phase, unless it's manually specified in the phase or an
99+
<<ilm-allocate-action, allocate action>> modifying the allocation rules is manually configured.
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
[role="xpack"]
2+
[[ilm-migrate]]
3+
=== Migrate
4+
5+
Phases allowed: warm, cold.
6+
7+
Moves the index to the <<data-tiers, data tier>> that corresponds
8+
to the current phase by updating the <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
9+
index setting.
10+
{ilm-init} automatically injects the migrate action in the warm and cold
11+
phases if no allocation options are specified with the <<ilm-allocate, allocate>> action.
12+
If you specify an allocate action that only modifies the number of index
13+
replicas, {ilm-init} reduces the number of replicas before migrating the index.
14+
To prevent automatic migration without specifying allocation options,
15+
you can explicitly include the migrate action and set the enabled option to `false`.
16+
17+
In the warm phase, the `migrate` action sets <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
18+
to `data_warm,data_hot`. This moves the index to nodes in the
19+
<<warm-tier, warm tier>>. If there are no nodes in the warm tier, it falls back to the
20+
<<hot-tier, hot tier>>.
21+
22+
In the cold phase, the `migrate` action sets
23+
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
24+
to `data_cold,data_warm,data_hot`. This moves the index to nodes in the
25+
<<cold-tier, cold tier>>. If there are no nodes in the cold tier, it falls back to the
26+
<<warm-tier, warm>> tier, or the <<hot-tier, hot>> tier if there are no warm nodes available.
27+
28+
The migrate action is not allowed in the hot phase.
29+
The initial index allocation is performed <<data-tier-allocation, automatically>>,
30+
and can be configured manually or via <<indices-templates, index templates>>.
31+
32+
[[ilm-migrate-options]]
33+
==== Options
34+
35+
`enabled`::
36+
(Optional, boolean)
37+
Controls whether {ilm-init} automatically migrates the index during this phase.
38+
Defaults to `true`.
39+
40+
[[ilm-enabled-migrate-ex]]
41+
==== Example
42+
43+
In the following policy, the allocate action is specified to reduce the number of replicas before {ilm-init} migrates the index to warm nodes.
44+
45+
NOTE: Explicitly specifying the migrate action is not required--{ilm-init} automatically performs the migrate action unless you specify allocation options or disable migration.
46+
47+
[source,console]
48+
--------------------------------------------------
49+
PUT _ilm/policy/my_policy
50+
{
51+
"policy": {
52+
"phases": {
53+
"warm": {
54+
"actions": {
55+
"migrate" : {
56+
},
57+
"allocate": {
58+
"number_of_replicas": 1
59+
}
60+
}
61+
}
62+
}
63+
}
64+
}
65+
--------------------------------------------------
66+
67+
[[ilm-disable-migrate-ex]]
68+
==== Disable automatic migration
69+
70+
The migrate action in the following policy is disabled and
71+
the allocate action assigns the index to nodes that have a
72+
`rack_id` of _one_ or _two_.
73+
NOTE: Explicitly disabling the migrate action is not required--{ilm-init} does not inject the migrate action if you specify allocation options.
74+
[source,console]
75+
--------------------------------------------------
76+
PUT _ilm/policy/my_policy
77+
{
78+
"policy": {
79+
"phases": {
80+
"warm": {
81+
"actions": {
82+
"migrate" : {
83+
"enabled": false
84+
},
85+
"allocate": {
86+
"include" : {
87+
"rack_id": "one,two"
88+
}
89+
}
90+
}
91+
}
92+
}
93+
}
94+
}
95+
--------------------------------------------------

docs/reference/ilm/ilm-actions.asciidoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ Makes the index read-only.
1818
[[ilm-freeze-action]]<<ilm-freeze,Freeze>>::
1919
Freeze the index to minimize its memory footprint.
2020

21+
[[ilm-migrate-action]]<<ilm-migrate,Migrate>>::
22+
Move the index shards to the <<data-tiers, data tier>> that corresponds
23+
to the current {ilm-init] phase.
24+
2125
[[ilm-readonly-action]]<<ilm-readonly,Read only>>::
2226
Block write operations to the index.
2327

@@ -54,6 +58,7 @@ include::actions/ilm-allocate.asciidoc[]
5458
include::actions/ilm-delete.asciidoc[]
5559
include::actions/ilm-forcemerge.asciidoc[]
5660
include::actions/ilm-freeze.asciidoc[]
61+
include::actions/ilm-migrate.asciidoc[]
5762
include::actions/ilm-readonly.asciidoc[]
5863
include::actions/ilm-rollover.asciidoc[]
5964
ifdef::permanently-unreleased-branch[]

docs/reference/index-modules/allocation.asciidoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ nodes:
77
* <<shard-allocation-filtering,Shard allocation filtering>>: Controlling which shards are allocated to which nodes.
88
* <<delayed-allocation,Delayed allocation>>: Delaying allocation of unassigned shards caused by a node leaving.
99
* <<allocation-total-shards,Total shards per node>>: A hard limit on the number of shards from the same index per node.
10+
* <<data-tier-shard-filtering, Data tier allocation>>: Controls the allocation of indices to <<data-tiers, data tiers>>.
1011

1112
include::allocation/filtering.asciidoc[]
1213

@@ -16,5 +17,4 @@ include::allocation/prioritization.asciidoc[]
1617

1718
include::allocation/total_shards.asciidoc[]
1819

19-
20-
20+
include::allocation/data_tier_allocation.asciidoc[]
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
[role="xpack"]
2+
[[data-tier-shard-filtering]]
3+
=== Index-level data tier allocation filtering
4+
5+
You can use index-level allocation settings to control which <<data-tiers, data tier>>
6+
the index is allocated to. The data tier allocator is a
7+
<<shard-allocation-filtering, shard allocation filter>> that uses two built-in
8+
node attributes: `_tier` and `_tier_preference`.
9+
10+
These tier attributes are set using the data node roles:
11+
12+
* <<data-content-node, data_content>>
13+
* <<data-hot-node, data_hot>>
14+
* <<data-warm-node, data_warm>>
15+
* <<data-cold-node, data_cold>>
16+
17+
NOTE: The <<data-node, data>> role is not a valid data tier and cannot be used
18+
for data tier filtering.
19+
20+
[discrete]
21+
[[data-tier-allocation-filters]]
22+
====Data tier allocation settings
23+
24+
25+
`index.routing.allocation.include._tier`::
26+
27+
Assign the index to a node whose `node.roles` configuration has at
28+
least one of to the comma-separated values.
29+
30+
`index.routing.allocation.require._tier`::
31+
32+
Assign the index to a node whose `node.roles` configuration has _all_
33+
of the comma-separated values.
34+
35+
`index.routing.allocation.exclude._tier`::
36+
37+
Assign the index to a node whose `node.roles` configuration has _none_ of the
38+
comma-separated values.
39+
40+
[[tier-preference-allocation-filter]]
41+
`index.routing.allocation.include._tier_preference`::
42+
43+
Assign the index to the first tier in the list that has an available node.
44+
This prevents indices from remaining unallocated if no nodes are available
45+
in the preferred tier.
46+
47+
For example, if you set `index.routing.allocation.include._tier_preference`
48+
to `data_warm,data_hot`, the index is allocated to the warm tier if there
49+
are nodes with the `data_warm` role. If there are no nodes in the warm tier,
50+
but there are nodes with the `data_hot` role, the index is allocated to
51+
the hot tier.

docs/reference/index-modules/allocation/filtering.asciidoc

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ a particular index. These per-index filters are applied in conjunction with
77
<<shard-allocation-awareness, allocation awareness>>.
88

99
Shard allocation filters can be based on custom node attributes or the built-in
10-
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host` and `_id` attributes.
11-
<<index-lifecycle-management, Index lifecycle management>> uses filters based
10+
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id`, `_tier` and `_tier_preference`
11+
attributes. <<index-lifecycle-management, Index lifecycle management>> uses filters based
1212
on custom node attributes to determine how to reallocate shards when moving
1313
between phases.
1414

@@ -102,6 +102,12 @@ The index allocation settings support the following built-in attributes:
102102
`_ip`:: Match either `_host_ip` or `_publish_ip`
103103
`_host`:: Match nodes by hostname
104104
`_id`:: Match nodes by node id
105+
`_tier`:: Match nodes by the node's <<data-tiers, data tier>> role.
106+
For more details see <<data-tier-shard-filtering, data tier allocation filtering>>
107+
108+
NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
109+
a subset of roles are <<data-tiers, data tier>> roles, and the generic
110+
<<data-node, data role>> will match any tier filtering.
105111

106112
You can use wildcards when specifying attribute values, for example:
107113

docs/reference/index.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ include::indices/index-templates.asciidoc[]
3030

3131
include::data-streams/data-streams.asciidoc[]
3232

33+
include::datatiers.asciidoc[]
34+
3335
include::ingest.asciidoc[]
3436

3537
include::search/search-your-data/search-your-data.asciidoc[]

docs/reference/modules/cluster/allocation_filtering.asciidoc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ conjunction with <<shard-allocation-filtering, per-index allocation filtering>>
77
and <<shard-allocation-awareness, allocation awareness>>.
88

99
Shard allocation filters can be based on custom node attributes or the built-in
10-
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host` and `_id` attributes.
10+
`_name`, `_host_ip`, `_publish_ip`, `_ip`, `_host`, `_id` and `_tier` attributes.
1111

1212
The `cluster.routing.allocation` settings are <<dynamic-cluster-setting,dynamic>>, enabling live indices to
1313
be moved from one set of nodes to another. Shards are only relocated if it is
@@ -55,7 +55,13 @@ The cluster allocation settings support the following built-in attributes:
5555
`_ip`:: Match either `_host_ip` or `_publish_ip`
5656
`_host`:: Match nodes by hostname
5757
`_id`:: Match nodes by node id
58+
`_tier`:: Match nodes by the node's <<data-tiers, data tier>> role
5859

60+
NOTE: `_tier` filtering is based on <<modules-node, node>> roles. Only
61+
a subset of roles are <<data-tiers, data tier>> roles, and the generic
62+
<<data-node, data role>> will match any tier filtering.
63+
a subset of roles that are <<data-tiers, data tier>> roles, but the generic
64+
<<data-node, data role>> will match any tier filtering.
5965

6066

6167
You can use wildcards when specifying attribute values, for example:

0 commit comments

Comments
 (0)