Skip to content

Conversation

@andreidan
Copy link
Contributor

@andreidan andreidan commented Jun 2, 2021

This adds a service that migrates the indices and ILM policies away from
custom node attribute allocation routing to data tiers.

The MetadataMigrateToDataTiersRoutingService operates on the ClusterState
and performs a few operations:

  1. Removes the (optionally) provided index template if it exists (note that it will
    only look for legacy templates. If a composable template with the provided name
    exists it will not be deleted)

  2. Iterates through the existing ILM policies and inspects the allocate actions that
    configure any (require, include, exclude) allocation for the given node attribute.
    When found, the actions are either removed completely (if they don't define
    number_or_replicas) or replaced by an allocate action that only defines the
    number_of_replicas
    eg.

allocate {
  number_of_replicas: 0,
  require: {data: warm},
  include: {rack: one}
}

becomes

allocate {
  number_of_replicas: 0
}

Removing all allocation rules (not just the one targeting data) will allow ILM to inject
the migrate action.

  1. Iterates through all the existing indices in order to do the following migration
  • for indices that define index.routing.allocation.require.data but do not define any
    index.routing.allocation.include._tier_preference, the service will remove index.routing.allocation.require.data
    and configure the corresponding index.routing.allocation.include._tier_preferencesetting
    (ie. if require.data is warm it will configure _tier_preference to data_warm,data_hot).
    If the index is configured with index.routing.allocation.require.data to any other value than
    hot, warm, cold, or frozen the index will not be migrated.

  • for indices that define both index.routing.allocation.require.data and index.routing.allocation.include._tier_preference
    the index.routing.allocation.require.data setting will be removed (trusting that the _tier_preference
    configuration reflects the needed/correct data tier routing and treating the require.data routing as
    incorrect)

  • for indices that define multiple node attribute routing settings (eg. both require.data and include.data)
    the require.data setting will be checked first and attempted to be migrated to the corresponding
    _tier_preference. If not possible (ie. due to invalid/unknown setting value) the include.data will be attempted
    to be migrated. If any of the settings migration is successful all the routing configurations for the
    configured node attribute (in our example data) will be removed from the index, as part of the migration process.

Notice that in both the allocate action rules and in the index routing allocation setting we talk
about the data attribute, but this is configurable (if not specified, we default to data).

The service validates that ILM is in the STOPPED state before performing any migration.

The REST API that'll expose this service and the documentation will be done in a follow up PR.

Relates to #73154

This adds a service that migrates the indices and ILM policies away from
custom node attribute allocation routing to data tiers.
@andreidan andreidan added WIP :Data Management/Indices APIs APIs to create and manage indices and templates :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Jun 2, 2021

static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) {
List<String> migratedIndices = new ArrayList<>();
String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For indices I'm only looking at the require.data routing setting (as opposed to also looking at include.data and exclude.data) even though for ILM policies any allocate action routing configuration for the data attribute will trigger the migration of the ILM policy.

The reasoning behind this is that routing configurations for indices can come from multiple places (which is part of the reason why we need to migrate at all :) ) as opposed to ILM policies which are meant to move forward an existing routing configuration (eg. the mantra in ILM might be more like "take it from the warm phase onwards and own it").

Even more than that, it'd be extremely difficult to know what the equivalent _tier_preference configuration would be for an index that has exclude.data:cold,warm,custom (and all the other myriad of options that might be out there)

@dakrone do you think it makes sense like this or should we try to do more for indices?

I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should inspect require and include (skipping exclude), I know some users use include primarily while others use require, so it'd be important to use both. exclude I would consider as a one-off.

I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?

I'm not sure that this helps us too much, is there a reason we would want to know this information? For now maybe we can go without and consider adding it if needed in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we would want to know this information?

I was thinking that with many indices in a deployment, some might not adhere to the rules and structure we expect. So it might be difficult to spot the 3 indices out of 300 that we couldn't migrate - until they can't be allocated.

We currently issue a warn logging so maybe that's enough for now (?)

Copy link
Contributor Author

@andreidan andreidan Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should inspect require and include (skipping exclude)

This is now implemented. Note that we don't attempt to make too much sense of the include configuration (some users might choose to use multiple values for this allocation configuration). If the require setting is not configured, we check the include for single value usage - ie. warm, OR cold etc

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan marked this pull request as ready for review June 3, 2021 13:14
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jun 3, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@andreidan andreidan requested a review from dakrone June 3, 2021 13:14
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @andreidan! I do think we should support both require and include, and I left a couple of other comments too, but nothing major

NamedXContentRegistry xContentRegistry, Client client) {
List<String> migratedPolicies = new ArrayList<>();
IndexLifecycleMetadata currentLifecycleMetadata = currentState.metadata().custom(IndexLifecycleMetadata.TYPE);
if (currentLifecycleMetadata != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of wrapping this in a huge if (since it's ~50 lines of interior statement), it might be cleaner to just have

if (currentLifecycleMetadata == null) {
    return Collections.emptyList();
}

And then not have to wrap it.

Map<String, LifecyclePolicyMetadata> currentPolicies = currentLifecycleMetadata.getPolicyMetadatas();
SortedMap<String, LifecyclePolicyMetadata> newPolicies = new TreeMap<>(currentPolicies);
for (Map.Entry<String, LifecyclePolicyMetadata> policyMetadataEntry : currentPolicies.entrySet()) {
LifecyclePolicy lifecyclePolicy = policyMetadataEntry.getValue().getPolicy();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is pretty huge, can you factor this out into a migrateSingleILMPolicy method that handles only a single ILM policy, where we can pass in the LifecyclePolicy object and get a new one?


static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) {
List<String> migratedIndices = new ArrayList<>();
String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should inspect require and include (skipping exclude), I know some users use include primarily while others use require, so it'd be important to use both. exclude I would consider as a one-off.

I was pondering maybe also reporting the indices that we did NOT migrate. What do you think?

I'm not sure that this helps us too much, is there a reason we would want to know this information? For now maybe we can go without and consider adding it if needed in the future.

static List<String> migrateIndices(Metadata.Builder mb, ClusterState currentState, String nodeAttrName) {
List<String> migratedIndices = new ArrayList<>();
String nodeAttrIndexRoutingSetting = INDEX_ROUTING_REQUIRE_GROUP_SETTING.getKey() + nodeAttrName;
for (ObjectObjectCursor<String, IndexMetadata> index : currentState.metadata().indices()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think same comment here about factoring out into a method that deals with a single index's settings that returns either the same Settings or a new Settings. These methods are a bit larger and splitting them up makes reading them a bit easier

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Andrei, I left some more comments, some very minor, the biggest issue is figuring out how to treat multiple attributes. I think we should favor one (require would be my choice), but still remove, because the allocation settings may be doubled between require and include, and it'd require re-running multiple times (potentially three times) to clear out all the settings.

IndexMetadata indexMetadata = index.value;
Settings currentSettings = indexMetadata.getSettings();
Settings newSettings = migrateIndexSettings(nodeAttrIndexRequireRoutingSetting, indexMetadata);
if (newSettings.equals(currentSettings)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because of this change (if I read this correctly), if you have both a require and include, only the require will be removed, so you'd end up with a before like:

{
  "require": "hot",
  "include": "hot"
}

And an after like:

{
  "_tier_preference": "data_hot",
  "include": "hot"
}

I think maybe migrateIndexSettings should remove all three types (require, include, exclude) whenever it encounters the attribute passed in and sets the new tier preference, or else remove them if in this method if the settings have changed and they have the right value?

Copy link
Contributor Author

@andreidan andreidan Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so I chose to say "we migrate require.data or if that's not present look at trying to migrate include.data" (which I think would cover most cases I've seen in the wild. Also all our hot-warm architecture docs talk about the require setting being used)

should remove all three types (require, include, exclude) whenever it encounters the attribute passed in and sets the new tier preference, or else remove them if in this method if the settings have changed and they have the right value?

I believe this could be a good compromise (that might possibly erase some configurations that attempt to implement sub-tiering using the same attribute name, but that's an edge case configuration I reckon).

I'll change the implementation to still firstly look at require.data and if we migrate this setting remove the include.data and exclude.data altogether.
If require.data is not present (or not migrateable - by virtue of using "the_hot_node" as value) we look at include.data and if migrating of include is successful we'll remove the require.data and exclude.data settings.
So both

{
  "require": "hot",
  "include": "hot"
}

and

{
  "require": "the_hot_node",
  "include": "hot"
}

will be migrated to

{
  "_tier_preference": "data_hot"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, thanks for changing this

assert oldPolicyMetadata != null :
"we must only update policies, not create new ones, but " + policyMetadataEntry.getKey() + " didn't exist";

updateIndicesForPolicy(mb, currentState, xContentRegistry, client, oldPolicyMetadata.getPolicy(), newPolicyMetadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should capture the return value for this so that we can have debug logging for the number of indices that had their cached policies updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value is currently true/false - indicating if any index was updated (ie. if the Metadata.Builder is changed). I added the logging inside the method to keep the return value as is (not sure we need to expose more at this point)

return migratedIndices;
}

private static Settings migrateIndexSettings(String attributeBasedRoutingSettingName, IndexMetadata indexMetadata) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to add javadocs to these (all the static methods in this class), even if very short, just to help future readers determine what they are doing and what they return

Comment on lines 376 to 377
assertThat("index migration ONLY clears the setting it migrates, in this case the require.data setting",
migratedIndex.getSettings().get(DATA_ROUTING_INCLUDE_SETTING), is("cold"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to clear both of these, and pick one to favor (as I mentioned above), what do you think?

Comment on lines 103 to 104
if (Strings.hasText(indexTemplateToDelete) &&
currentState.metadata().getTemplates().containsKey(indexTemplateToDelete)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to have a (debug) log message if a template was passed in, but not actually present in the cluster state and therefore couldn't be deleted

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan requested a review from dakrone June 15, 2021 14:36
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating on this, I left one really minor comment but not a big deal either way.

*
* Returns the same {@link LifecycleExecutionState} if the transition is not possible or the new execution state otherwise.
*/
private static LifecycleExecutionState moveStateToNextActionAndUpdateCachedPhase(IndexMetadata indexMetadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor, but I think this method would be better if moved into IndexLifecycleTransition, as we may want to eventually have a way to "skip" the current step. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this could be interesting.

So I've placed this service in xpack.core (as it touches several abstractions and feels like a "core" functionality). This wouldn't allow us to reference IndexLifecycleTransition as that's xpack.ilm.

However, given ILM is a big part in data tiers - the biggest bit in this transition service too- and it will be eaasier to test the REST integration of this service (given the whole infrastructure for that is present in xpack.ilm) I now believe we should have the MetadataMigrateToDataTiersRoutingService and the corresponding REST API that's about to come in xpack.ilm.

What do you think?

IndexMetadata indexMetadata = index.value;
Settings currentSettings = indexMetadata.getSettings();
Settings newSettings = migrateIndexSettings(nodeAttrIndexRequireRoutingSetting, indexMetadata);
if (newSettings.equals(currentSettings)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, thanks for changing this

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan merged commit 6285fac into elastic:master Jun 17, 2021
andreidan added a commit to andreidan/elasticsearch that referenced this pull request Jun 18, 2021
)

This adds a service that migrates the indices and ILM policies away from
custom node attribute allocation routing to data tiers. Optionally, it also
deletes one legacy index template.

(cherry picked from commit 6285fac)
Signed-off-by: Andrei Dan <[email protected]>
andreidan added a commit that referenced this pull request Jun 18, 2021
…74287)

This adds a service that migrates the indices and ILM policies away from
custom node attribute allocation routing to data tiers. Optionally, it also
deletes one legacy index template.

(cherry picked from commit 6285fac)
Signed-off-by: Andrei Dan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/ILM+SLM Index and Snapshot lifecycle management :Data Management/Indices APIs APIs to create and manage indices and templates >feature Team:Data Management Meta label for data/management team v7.14.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants