Service to migrate indices and ILM policies to data tiers #73689

andreidan · 2021-06-02T16:50:05Z

This adds a service that migrates the indices and ILM policies away from
custom node attribute allocation routing to data tiers.

The MetadataMigrateToDataTiersRoutingService operates on the ClusterState
and performs a few operations:

Removes the (optionally) provided index template if it exists (note that it will
only look for legacy templates. If a composable template with the provided name
exists it will not be deleted)
Iterates through the existing ILM policies and inspects the allocate actions that
configure any (require, include, exclude) allocation for the given node attribute.
When found, the actions are either removed completely (if they don't define
number_or_replicas) or replaced by an allocate action that only defines the
number_of_replicas
eg.

allocate {
  number_of_replicas: 0,
  require: {data: warm},
  include: {rack: one}
}

becomes

allocate {
  number_of_replicas: 0
}

Removing all allocation rules (not just the one targeting data) will allow ILM to inject
the migrate action.

Iterates through all the existing indices in order to do the following migration

for indices that define index.routing.allocation.require.data but do not define any
index.routing.allocation.include._tier_preference, the service will remove index.routing.allocation.require.data
and configure the corresponding index.routing.allocation.include._tier_preferencesetting
(ie. if require.data is warm it will configure _tier_preference to data_warm,data_hot).
If the index is configured with index.routing.allocation.require.data to any other value than
hot, warm, cold, or frozen the index will not be migrated.
for indices that define both index.routing.allocation.require.data and index.routing.allocation.include._tier_preference
the index.routing.allocation.require.data setting will be removed (trusting that the _tier_preference
configuration reflects the needed/correct data tier routing and treating the require.data routing as
incorrect)
for indices that define multiple node attribute routing settings (eg. both require.data and include.data)
the require.data setting will be checked first and attempted to be migrated to the corresponding
_tier_preference. If not possible (ie. due to invalid/unknown setting value) the include.data will be attempted
to be migrated. If any of the settings migration is successful all the routing configurations for the
configured node attribute (in our example data) will be removed from the index, as part of the migration process.

Notice that in both the allocate action rules and in the index routing allocation setting we talk
about the data attribute, but this is configurable (if not specified, we default to data).

The service validates that ILM is in the STOPPED state before performing any migration.

The REST API that'll expose this service and the documentation will be done in a follow up PR.

Relates to #73154

This adds a service that migrates the indices and ILM policies away from custom node attribute allocation routing to data tiers.

andreidan · 2021-06-02T17:22:12Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+
+    static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) {
+        List<String> migratedIndices = new ArrayList<>();
+        String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName;


For indices I'm only looking at the require.data routing setting (as opposed to also looking at include.data and exclude.data) even though for ILM policies any allocate action routing configuration for the data attribute will trigger the migration of the ILM policy.

The reasoning behind this is that routing configurations for indices can come from multiple places (which is part of the reason why we need to migrate at all :) ) as opposed to ILM policies which are meant to move forward an existing routing configuration (eg. the mantra in ILM might be more like "take it from the warm phase onwards and own it").

Even more than that, it'd be extremely difficult to know what the equivalent _tier_preference configuration would be for an index that has exclude.data:cold,warm,custom (and all the other myriad of options that might be out there)

@dakrone do you think it makes sense like this or should we try to do more for indices?

I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?

I think we should inspect require and include (skipping exclude), I know some users use include primarily while others use require, so it'd be important to use both. exclude I would consider as a one-off.

I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?

I'm not sure that this helps us too much, is there a reason we would want to know this information? For now maybe we can go without and consider adding it if needed in the future.

is there a reason we would want to know this information?

I was thinking that with many indices in a deployment, some might not adhere to the rules and structure we expect. So it might be difficult to spot the 3 indices out of 300 that we couldn't migrate - until they can't be allocated.

We currently issue a warn logging so maybe that's enough for now (?)

I think we should inspect require and include (skipping exclude)

This is now implemented. Note that we don't attempt to make too much sense of the include configuration (some users might choose to use multiple values for this allocation configuration). If the require setting is not configured, we check the include for single value usage - ie. warm, OR cold etc

andreidan · 2021-06-03T09:30:02Z

@elasticmachine update branch

elasticmachine · 2021-06-03T13:14:23Z

Pinging @elastic/es-core-features (Team:Core/Features)

andreidan · 2021-06-04T12:04:10Z

@elasticmachine update branch

andreidan · 2021-06-04T13:50:02Z

@elasticmachine update branch

andreidan · 2021-06-07T09:24:10Z

@elasticmachine update branch

dakrone

Thanks for working on this @andreidan! I do think we should support both require and include, and I left a couple of other comments too, but nothing major

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

dakrone · 2021-06-03T22:55:51Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+                                           NamedXContentRegistry xContentRegistry, Client client) {
+        List<String> migratedPolicies = new ArrayList<>();
+        IndexLifecycleMetadata currentLifecycleMetadata = currentState.metadata().custom(IndexLifecycleMetadata.TYPE);
+        if (currentLifecycleMetadata != null) {


I think instead of wrapping this in a huge if (since it's ~50 lines of interior statement), it might be cleaner to just have

if (currentLifecycleMetadata == null) { return Collections.emptyList(); }

And then not have to wrap it.

dakrone · 2021-06-03T22:57:01Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+            Map<String, LifecyclePolicyMetadata> currentPolicies = currentLifecycleMetadata.getPolicyMetadatas();
+            SortedMap<String, LifecyclePolicyMetadata> newPolicies = new TreeMap<>(currentPolicies);
+            for (Map.Entry<String, LifecyclePolicyMetadata> policyMetadataEntry : currentPolicies.entrySet()) {
+                LifecyclePolicy lifecyclePolicy = policyMetadataEntry.getValue().getPolicy();


This method is pretty huge, can you factor this out into a migrateSingleILMPolicy method that handles only a single ILM policy, where we can pass in the LifecyclePolicy object and get a new one?

dakrone · 2021-06-07T20:54:42Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+
+    static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) {
+        List<String> migratedIndices = new ArrayList<>();
+        String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName;


I think we should inspect require and include (skipping exclude), I know some users use include primarily while others use require, so it'd be important to use both. exclude I would consider as a one-off.

I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?

I'm not sure that this helps us too much, is there a reason we would want to know this information? For now maybe we can go without and consider adding it if needed in the future.

dakrone · 2021-06-07T20:59:27Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+    static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) {
+        List<String> migratedIndices = new ArrayList<>();
+        String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName;
+        for (ObjectObjectCursor<String, IndexMetadata> index : currentState.metadata().indices()) {


I think same comment here about factoring out into a method that deals with a single index's settings that returns either the same Settings or a new Settings. These methods are a bit larger and splitting them up makes reading them a bit easier

Co-authored-by: Lee Hinman <[email protected]>

andreidan · 2021-06-08T09:56:48Z

@elasticmachine update branch

dakrone

Thanks for iterating on this Andrei, I left some more comments, some very minor, the biggest issue is figuring out how to treat multiple attributes. I think we should favor one (require would be my choice), but still remove, because the allocation settings may be doubled between require and include, and it'd require re-running multiple times (potentially three times) to clear out all the settings.

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

dakrone · 2021-06-10T20:34:35Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+            IndexMetadata indexMetadata = index.value;
+            Settings currentSettings = indexMetadata.getSettings();
+            Settings newSettings = migrateIndexSettings(nodeAttrIndexRequireRoutingSetting, indexMetadata);
+            if (newSettings.equals(currentSettings)) {


I think because of this change (if I read this correctly), if you have both a require and include, only the require will be removed, so you'd end up with a before like:

{ "require": "hot", "include": "hot" }

And an after like:

{ "_tier_preference": "data_hot", "include": "hot" }

I think maybe migrateIndexSettings should remove all three types (require, include, exclude) whenever it encounters the attribute passed in and sets the new tier preference, or else remove them if in this method if the settings have changed and they have the right value?

Yes, so I chose to say "we migrate require.data or if that's not present look at trying to migrate include.data" (which I think would cover most cases I've seen in the wild. Also all our hot-warm architecture docs talk about the require setting being used)

should remove all three types (require, include, exclude) whenever it encounters the attribute passed in and sets the new tier preference, or else remove them if in this method if the settings have changed and they have the right value?

I believe this could be a good compromise (that might possibly erase some configurations that attempt to implement sub-tiering using the same attribute name, but that's an edge case configuration I reckon).

I'll change the implementation to still firstly look at require.data and if we migrate this setting remove the include.data and exclude.data altogether.
If require.data is not present (or not migrateable - by virtue of using "the_hot_node" as value) we look at include.data and if migrating of include is successful we'll remove the require.data and exclude.data settings.
So both

{ "require": "hot", "include": "hot" }

and

{ "require": "the_hot_node", "include": "hot" }

will be migrated to

{ "_tier_preference": "data_hot" }

That sounds good to me, thanks for changing this

dakrone · 2021-06-10T20:39:02Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+                assert oldPolicyMetadata != null :
+                    "we must only update policies, not create new ones, but " + policyMetadataEntry.getKey() + " didn't exist";
+
+                updateIndicesForPolicy(mb, currentState, xContentRegistry, client, oldPolicyMetadata.getPolicy(), newPolicyMetadata);


Perhaps we should capture the return value for this so that we can have debug logging for the number of indices that had their cached policies updated?

The return value is currently true/false - indicating if any index was updated (ie. if the Metadata.Builder is changed). I added the logging inside the method to keep the return value as is (not sure we need to expose more at this point)

dakrone · 2021-06-10T20:42:40Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+        return migratedIndices;
+    }
+
+    private static Settings migrateIndexSettings(String attributeBasedRoutingSettingName, IndexMetadata indexMetadata) {


I think it makes sense to add javadocs to these (all the static methods in this class), even if very short, just to help future readers determine what they are doing and what they return

dakrone · 2021-06-10T20:46:59Z

.../org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingServiceTests.java

+        assertThat("index migration ONLY clears the setting it migrates, in this case the require.data setting",
+            migratedIndex.getSettings().get(DATA_ROUTING_INCLUDE_SETTING), is("cold"));


I think we'll want to clear both of these, and pick one to favor (as I mentioned above), what do you think?

dakrone · 2021-06-10T20:48:54Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+        if (Strings.hasText(indexTemplateToDelete) &&
+            currentState.metadata().getTemplates().containsKey(indexTemplateToDelete)) {


I think it'd be nice to have a (debug) log message if a template was passed in, but not actually present in the cluster state and therefore couldn't be deleted

andreidan · 2021-06-14T09:52:25Z

@elasticmachine update branch

Co-authored-by: Lee Hinman <[email protected]>

andreidan · 2021-06-15T10:30:18Z

@elasticmachine update branch

andreidan · 2021-06-15T13:39:22Z

@elasticmachine update branch

…e allocate action

dakrone

LGTM, thanks for iterating on this, I left one really minor comment but not a big deal either way.

dakrone · 2021-06-16T21:32:10Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+     *
+     * Returns the same {@link LifecycleExecutionState} if the transition is not possible or the new execution state otherwise.
+     */
+    private static LifecycleExecutionState moveStateToNextActionAndUpdateCachedPhase(IndexMetadata indexMetadata,


Super minor, but I think this method would be better if moved into IndexLifecycleTransition, as we may want to eventually have a way to "skip" the current step. What do you think?

Ah yes, this could be interesting.

So I've placed this service in xpack.core (as it touches several abstractions and feels like a "core" functionality). This wouldn't allow us to reference IndexLifecycleTransition as that's xpack.ilm.

However, given ILM is a big part in data tiers - the biggest bit in this transition service too- and it will be eaasier to test the REST integration of this service (given the whole infrastructure for that is present in xpack.ilm) I now believe we should have the MetadataMigrateToDataTiersRoutingService and the corresponding REST API that's about to come in xpack.ilm.

What do you think?

dakrone · 2021-06-16T21:36:03Z

.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java

+            IndexMetadata indexMetadata = index.value;
+            Settings currentSettings = indexMetadata.getSettings();
+            Settings newSettings = migrateIndexSettings(nodeAttrIndexRequireRoutingSetting, indexMetadata);
+            if (newSettings.equals(currentSettings)) {


That sounds good to me, thanks for changing this

andreidan · 2021-06-17T10:05:09Z

@elasticmachine update branch

…Transition

) This adds a service that migrates the indices and ILM policies away from custom node attribute allocation routing to data tiers. Optionally, it also deletes one legacy index template. (cherry picked from commit 6285fac) Signed-off-by: Andrei Dan <[email protected]>

…74287) This adds a service that migrates the indices and ILM policies away from custom node attribute allocation routing to data tiers. Optionally, it also deletes one legacy index template. (cherry picked from commit 6285fac) Signed-off-by: Andrei Dan <[email protected]>

Service to migrate indices and ILM policies to data tiers

0b3e3a0

This adds a service that migrates the indices and ILM policies away from custom node attribute allocation routing to data tiers.

andreidan added WIP :Data Management/Indices APIs APIs to create and manage indices and templates :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Jun 2, 2021

Fix line length

9f9a64f

andreidan commented Jun 2, 2021

View reviewed changes

Javadoc mention we update the cached phase definition

fc04682

elasticmachine and others added 3 commits June 3, 2021 04:30

Merge branch 'master' into migrate-to-data-tiers

3faa8b4

Add validation that ILM is STOPPED

e41c599

Test migration doesn't delete composable templates

405f1b0

andreidan added v7.14.0 v8.0.0 and removed WIP labels Jun 3, 2021

andreidan marked this pull request as ready for review June 3, 2021 13:14

elasticmachine added the Team:Data Management Meta label for data/management team label Jun 3, 2021

andreidan requested a review from dakrone June 3, 2021 13:14

Merge branch 'master' into migrate-to-data-tiers

ac854b5

Merge branch 'master' into migrate-to-data-tiers

82baff0

Merge branch 'master' into migrate-to-data-tiers

58bdba7

dakrone requested changes Jun 7, 2021

View reviewed changes

andreidan and others added 2 commits June 8, 2021 10:53

Use Strings.hasText

33bd70a

Co-authored-by: Lee Hinman <[email protected]>

Flip ILM metadata check

fba9f61

elasticmachine and others added 2 commits June 8, 2021 04:56

Merge branch 'master' into migrate-to-data-tiers

690808e

Inspect the include.data node attribute routing too

83ce6ac

dakrone requested changes Jun 10, 2021

View reviewed changes

elasticmachine and others added 2 commits June 14, 2021 04:52

Merge branch 'master' into migrate-to-data-tiers

c8cfac2

Oxford comma

f05ef74

Co-authored-by: Lee Hinman <[email protected]>

elasticmachine and others added 4 commits June 15, 2021 05:30

Merge branch 'master' into migrate-to-data-tiers

0203343

Debug logging when the template doesn't exist

9cc1198

Debug log how many indices had their ILM phase definition refreshed

00659d1

Remove all attribute routing settings when indices are migrated

ddd6aa7

Merge branch 'master' into migrate-to-data-tiers

effa1db

andreidan requested a review from dakrone June 15, 2021 14:36

Fix phase definition refresh for migrated policies where we remove th…

3d3473e

…e allocate action

dakrone approved these changes Jun 16, 2021

View reviewed changes

elasticmachine and others added 4 commits June 17, 2021 05:05

Merge branch 'master' into migrate-to-data-tiers

4c84c88

Inline var

01ad3d9

Move MetadataMigrateToDataTiersRoutingService to xpack.ilm

354b645

Extract moveStateToNextActionAndUpdateCachedPhase into IndexLifecycle…

a5cd72e

…Transition

andreidan merged commit 6285fac into elastic:master Jun 17, 2021

andreidan added the backport pending label Jun 17, 2021

andreidan mentioned this pull request Jun 17, 2021

Add migrate to data tiers API #74264

Merged

andreidan mentioned this pull request Jun 18, 2021

[7.x] Service to migrate indices and ILM policies to data tiers (#73689) #74287

Merged

andreidan removed the backport pending label Jun 18, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

probakowski added the >feature label Jul 30, 2021

romain-chanu mentioned this pull request Dec 11, 2021

Bug in MetadataMigrateToDataTiersRoutingService - _tier_preference could be incorrect #81633

Closed

		assertThat("index migration ONLY clears the setting it migrates, in this case the require.data setting",
		migratedIndex.getSettings().get(DATA_ROUTING_INCLUDE_SETTING), is("cold"));

		if (Strings.hasText(indexTemplateToDelete) &&
		currentState.metadata().getTemplates().containsKey(indexTemplateToDelete)) {

Service to migrate indices and ILM policies to data tiers #73689

Service to migrate indices and ILM policies to data tiers #73689

Uh oh!

Conversation

andreidan commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreidan Jun 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreidan commented Jun 3, 2021

Uh oh!

elasticmachine commented Jun 3, 2021

Uh oh!

andreidan commented Jun 4, 2021

Uh oh!

andreidan commented Jun 4, 2021

Uh oh!

andreidan commented Jun 7, 2021

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreidan commented Jun 8, 2021

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreidan Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreidan commented Jun 14, 2021

Uh oh!

andreidan commented Jun 15, 2021

Uh oh!

andreidan commented Jun 15, 2021

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreidan commented Jun 17, 2021

Uh oh!

andreidan commented Jun 2, 2021 •

edited

Loading

andreidan Jun 8, 2021 •

edited

Loading

andreidan Jun 15, 2021 •

edited

Loading