-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Service to migrate indices and ILM policies to data tiers #73689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds a service that migrates the indices and ILM policies away from custom node attribute allocation routing to data tiers.
|
|
||
| static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) { | ||
| List<String> migratedIndices = new ArrayList<>(); | ||
| String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For indices I'm only looking at the require.data routing setting (as opposed to also looking at include.data and exclude.data) even though for ILM policies any allocate action routing configuration for the data attribute will trigger the migration of the ILM policy.
The reasoning behind this is that routing configurations for indices can come from multiple places (which is part of the reason why we need to migrate at all :) ) as opposed to ILM policies which are meant to move forward an existing routing configuration (eg. the mantra in ILM might be more like "take it from the warm phase onwards and own it").
Even more than that, it'd be extremely difficult to know what the equivalent _tier_preference configuration would be for an index that has exclude.data:cold,warm,custom (and all the other myriad of options that might be out there)
@dakrone do you think it makes sense like this or should we try to do more for indices?
I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should inspect require and include (skipping exclude), I know some users use include primarily while others use require, so it'd be important to use both. exclude I would consider as a one-off.
I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?
I'm not sure that this helps us too much, is there a reason we would want to know this information? For now maybe we can go without and consider adding it if needed in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we would want to know this information?
I was thinking that with many indices in a deployment, some might not adhere to the rules and structure we expect. So it might be difficult to spot the 3 indices out of 300 that we couldn't migrate - until they can't be allocated.
We currently issue a warn logging so maybe that's enough for now (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should inspect require and include (skipping exclude)
This is now implemented. Note that we don't attempt to make too much sense of the include configuration (some users might choose to use multiple values for this allocation configuration). If the require setting is not configured, we check the include for single value usage - ie. warm, OR cold etc
|
@elasticmachine update branch |
|
Pinging @elastic/es-core-features (Team:Core/Features) |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
dakrone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @andreidan! I do think we should support both require and include, and I left a couple of other comments too, but nothing major
.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java
Outdated
Show resolved
Hide resolved
| NamedXContentRegistry xContentRegistry, Client client) { | ||
| List<String> migratedPolicies = new ArrayList<>(); | ||
| IndexLifecycleMetadata currentLifecycleMetadata = currentState.metadata().custom(IndexLifecycleMetadata.TYPE); | ||
| if (currentLifecycleMetadata != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of wrapping this in a huge if (since it's ~50 lines of interior statement), it might be cleaner to just have
if (currentLifecycleMetadata == null) {
return Collections.emptyList();
}And then not have to wrap it.
| Map<String, LifecyclePolicyMetadata> currentPolicies = currentLifecycleMetadata.getPolicyMetadatas(); | ||
| SortedMap<String, LifecyclePolicyMetadata> newPolicies = new TreeMap<>(currentPolicies); | ||
| for (Map.Entry<String, LifecyclePolicyMetadata> policyMetadataEntry : currentPolicies.entrySet()) { | ||
| LifecyclePolicy lifecyclePolicy = policyMetadataEntry.getValue().getPolicy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is pretty huge, can you factor this out into a migrateSingleILMPolicy method that handles only a single ILM policy, where we can pass in the LifecyclePolicy object and get a new one?
|
|
||
| static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) { | ||
| List<String> migratedIndices = new ArrayList<>(); | ||
| String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should inspect require and include (skipping exclude), I know some users use include primarily while others use require, so it'd be important to use both. exclude I would consider as a one-off.
I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?
I'm not sure that this helps us too much, is there a reason we would want to know this information? For now maybe we can go without and consider adding it if needed in the future.
| static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) { | ||
| List<String> migratedIndices = new ArrayList<>(); | ||
| String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName; | ||
| for (ObjectObjectCursor<String, IndexMetadata> index : currentState.metadata().indices()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think same comment here about factoring out into a method that deals with a single index's settings that returns either the same Settings or a new Settings. These methods are a bit larger and splitting them up makes reading them a bit easier
Co-authored-by: Lee Hinman <[email protected]>
|
@elasticmachine update branch |
dakrone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this Andrei, I left some more comments, some very minor, the biggest issue is figuring out how to treat multiple attributes. I think we should favor one (require would be my choice), but still remove, because the allocation settings may be doubled between require and include, and it'd require re-running multiple times (potentially three times) to clear out all the settings.
.../java/org/elasticsearch/xpack/cluster/metadata/MetadataMigrateToDataTiersRoutingService.java
Outdated
Show resolved
Hide resolved
| IndexMetadata indexMetadata = index.value; | ||
| Settings currentSettings = indexMetadata.getSettings(); | ||
| Settings newSettings = migrateIndexSettings(nodeAttrIndexRequireRoutingSetting, indexMetadata); | ||
| if (newSettings.equals(currentSettings)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because of this change (if I read this correctly), if you have both a require and include, only the require will be removed, so you'd end up with a before like:
{
"require": "hot",
"include": "hot"
}
And an after like:
{
"_tier_preference": "data_hot",
"include": "hot"
}
I think maybe migrateIndexSettings should remove all three types (require, include, exclude) whenever it encounters the attribute passed in and sets the new tier preference, or else remove them if in this method if the settings have changed and they have the right value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so I chose to say "we migrate require.data or if that's not present look at trying to migrate include.data" (which I think would cover most cases I've seen in the wild. Also all our hot-warm architecture docs talk about the require setting being used)
should remove all three types (require, include, exclude) whenever it encounters the attribute passed in and sets the new tier preference, or else remove them if in this method if the settings have changed and they have the right value?
I believe this could be a good compromise (that might possibly erase some configurations that attempt to implement sub-tiering using the same attribute name, but that's an edge case configuration I reckon).
I'll change the implementation to still firstly look at require.data and if we migrate this setting remove the include.data and exclude.data altogether.
If require.data is not present (or not migrateable - by virtue of using "the_hot_node" as value) we look at include.data and if migrating of include is successful we'll remove the require.data and exclude.data settings.
So both
{
"require": "hot",
"include": "hot"
}
and
{
"require": "the_hot_node",
"include": "hot"
}
will be migrated to
{
"_tier_preference": "data_hot"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me, thanks for changing this
| assert oldPolicyMetadata != null : | ||
| "we must only update policies, not create new ones, but " + policyMetadataEntry.getKey() + " didn't exist"; | ||
|
|
||
| updateIndicesForPolicy(mb, currentState, xContentRegistry, client, oldPolicyMetadata.getPolicy(), newPolicyMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should capture the return value for this so that we can have debug logging for the number of indices that had their cached policies updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value is currently true/false - indicating if any index was updated (ie. if the Metadata.Builder is changed). I added the logging inside the method to keep the return value as is (not sure we need to expose more at this point)
| return migratedIndices; | ||
| } | ||
|
|
||
| private static Settings migrateIndexSettings(String attributeBasedRoutingSettingName, IndexMetadata indexMetadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to add javadocs to these (all the static methods in this class), even if very short, just to help future readers determine what they are doing and what they return
| assertThat("index migration ONLY clears the setting it migrates, in this case the require.data setting", | ||
| migratedIndex.getSettings().get(DATA_ROUTING_INCLUDE_SETTING), is("cold")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want to clear both of these, and pick one to favor (as I mentioned above), what do you think?
| if (Strings.hasText(indexTemplateToDelete) && | ||
| currentState.metadata().getTemplates().containsKey(indexTemplateToDelete)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be nice to have a (debug) log message if a template was passed in, but not actually present in the cluster state and therefore couldn't be deleted
|
@elasticmachine update branch |
Co-authored-by: Lee Hinman <[email protected]>
|
@elasticmachine update branch |
|
@elasticmachine update branch |
…e allocate action
dakrone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for iterating on this, I left one really minor comment but not a big deal either way.
| * | ||
| * Returns the same {@link LifecycleExecutionState} if the transition is not possible or the new execution state otherwise. | ||
| */ | ||
| private static LifecycleExecutionState moveStateToNextActionAndUpdateCachedPhase(IndexMetadata indexMetadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor, but I think this method would be better if moved into IndexLifecycleTransition, as we may want to eventually have a way to "skip" the current step. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, this could be interesting.
So I've placed this service in xpack.core (as it touches several abstractions and feels like a "core" functionality). This wouldn't allow us to reference IndexLifecycleTransition as that's xpack.ilm.
However, given ILM is a big part in data tiers - the biggest bit in this transition service too- and it will be eaasier to test the REST integration of this service (given the whole infrastructure for that is present in xpack.ilm) I now believe we should have the MetadataMigrateToDataTiersRoutingService and the corresponding REST API that's about to come in xpack.ilm.
What do you think?
| IndexMetadata indexMetadata = index.value; | ||
| Settings currentSettings = indexMetadata.getSettings(); | ||
| Settings newSettings = migrateIndexSettings(nodeAttrIndexRequireRoutingSetting, indexMetadata); | ||
| if (newSettings.equals(currentSettings)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me, thanks for changing this
|
@elasticmachine update branch |
) This adds a service that migrates the indices and ILM policies away from custom node attribute allocation routing to data tiers. Optionally, it also deletes one legacy index template. (cherry picked from commit 6285fac) Signed-off-by: Andrei Dan <[email protected]>
…74287) This adds a service that migrates the indices and ILM policies away from custom node attribute allocation routing to data tiers. Optionally, it also deletes one legacy index template. (cherry picked from commit 6285fac) Signed-off-by: Andrei Dan <[email protected]>
This adds a service that migrates the indices and ILM policies away from
custom node attribute allocation routing to data tiers.
The
MetadataMigrateToDataTiersRoutingServiceoperates on theClusterStateand performs a few operations:
Removes the (optionally) provided index template if it exists (note that it will
only look for legacy templates. If a composable template with the provided name
exists it will not be deleted)
Iterates through the existing ILM policies and inspects the
allocateactions thatconfigure any (require, include, exclude) allocation for the given node attribute.
When found, the actions are either removed completely (if they don't define
number_or_replicas) or replaced by anallocateaction that only defines thenumber_of_replicaseg.
becomes
Removing all allocation rules (not just the one targeting
data) will allow ILM to injectthe
migrateaction.for indices that define
index.routing.allocation.require.databut do not define anyindex.routing.allocation.include._tier_preference, the service will removeindex.routing.allocation.require.dataand configure the corresponding
index.routing.allocation.include._tier_preferencesetting(ie. if
require.dataiswarmit will configure_tier_preferencetodata_warm,data_hot).If the index is configured with
index.routing.allocation.require.datato any other value thanhot,warm,cold, orfrozenthe index will not be migrated.for indices that define both
index.routing.allocation.require.dataandindex.routing.allocation.include._tier_preferencethe
index.routing.allocation.require.datasetting will be removed (trusting that the_tier_preferenceconfiguration reflects the needed/correct data tier routing and treating the
require.datarouting asincorrect)
for indices that define multiple node attribute routing settings (eg. both
require.dataandinclude.data)the
require.datasetting will be checked first and attempted to be migrated to the corresponding_tier_preference. If not possible (ie. due to invalid/unknown setting value) theinclude.datawill be attemptedto be migrated. If any of the settings migration is successful all the routing configurations for the
configured node attribute (in our example
data) will be removed from the index, as part of the migration process.Notice that in both the
allocateaction rules and in the index routing allocation setting we talkabout the
dataattribute, but this is configurable (if not specified, we default todata).The service validates that ILM is in the
STOPPEDstate before performing any migration.The REST API that'll expose this service and the documentation will be done in a follow up PR.
Relates to #73154