Skip to content

Conversation

@dakrone
Copy link
Member

@dakrone dakrone commented Jun 2, 2021

This commit makes ILM aware of different parts of the node shutdown lifecycle. It consists are two
main parts, reacting to the state during execution, and signaling the status of shutdown from ILM.

Reacting to shutdown state
ILM now considers nodes that are going to be shut down when deciding which node to assign for the
shrink action. It uses the NodeShutdownAllocationDecider within the SetSingleNodeAllocateStep to
not assign shards to a node that will be removed. If an index is already past this step and waiting
for allocation, this commit adds an isCompletable method to the
ClusterStateWaitUntilThresholdStep so that an allocation that cannot happen can be rewound and
retried on another (non-shutdown) node.

Signaling shutdown status
This commit introduces the PluginShutdownService which deals with ShutdownAwarePlugin classes.
This class is used to signal shutdowns to plugins, and also to gather the status of a shutdown from
these plugins. ILM implements this ShutdownAwarePlugin to signal if an index is in a step that is
unsafe, such as the actual shrink step, so that shutdown will wait until after the allocation rules
have been removed by ILM.

This commit also hooks up the get shutdown API response to consider the statuses of its parts (see
SingleNodeShutdownMetadata.Status#combine) when creating a response.

Relates to #70338

This commit makes ILM aware of different parts of the node shutdown lifecycle. It consists are two
main parts, reacting to the state during execution, and signaling the status of shutdown from ILM.

Reacting to shutdown state
ILM now considers nodes that are going to be shut down when deciding which node to assign for the
shrink action. It uses the `NodeShutdownAllocationDecider` within the `SetSingleNodeAllocateStep` to
not assign shards to a node that will be removed. If an index is already past this step and waiting
for allocation, this commit adds an `isCompletable` method to the
`ClusterStateWaitUntilThresholdStep` so that an allocation that cannot happen can be rewound and
retried on another (non-shutdown) node.

Signaling shutdown status
This commit introduces the `PluginShutdownService` which deals with `ShutdownAwarePlugin` classes.
This class is used to signal shutdowns to plugins, and also to gather the status of a shutdown from
these plugins. ILM implements this `ShutdownAwarePlugin` to signal if an index is in a step that is
unsafe, such as the actual shrink step, so that shutdown will wait until after the allocation rules
have been removed by ILM.

This commit also hooks up the get shutdown API response to consider the statuses of its parts (see
`SingleNodeShutdownMetadata.Status#combine`) when creating a response.

Relates to elastic#70338
@dakrone dakrone added :Data Management/ILM+SLM Index and Snapshot lifecycle management v8.0.0 :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown v7.14.0 labels Jun 2, 2021
@dakrone dakrone requested review from andreidan and gwbrown June 2, 2021 17:11
@elasticmachine elasticmachine added Team:Data Management Meta label for data/management team Team:Core/Infra Meta label for core/infra team labels Jun 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@dakrone
Copy link
Member Author

dakrone commented Jun 2, 2021

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lee.

This generally looks great, I had one question.

Comment on lines 78 to 79
.map(nmm -> nmm.get(idShardsShouldBeOn))
.map(snsm -> snsm.getType() == SingleNodeShutdownMetadata.Type.REMOVE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a personal preference so feel free to ignore, but I find it very difficult to read nmn, snsm, and c below in IndexLifecycleService

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've renamed these to hopefully be better

Comment on lines 82 to 86
if (nodeBeingRemoved) {
completable = false;
return new Result(false, new SingleMessageFieldInfo("node with id [" + idShardsShouldBeOn +
"] is currently marked as shutting down for removal"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this check below, and only execute it if we did NOT already relocate all the necessary shards to the target node here ?

If we're ready to execute shrink should the shutdown wait for the shrink action/task to finish? IndexLifecycleService already signals that shutdown should not be executed if we're in this step (in readyToShutdown)

As opposed to allowing the shutdown to continue and then re-doing the shard allocation to another node? (given DTS issues or, if in the same zone, generally moving GBs of data in the cluster could be avoided this way, but maybe it's impractical from the node shutdown infrastructure perspective?)

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I would say they are both around the same, so maybe it's better to do the check afterwards. This would mean that we'd prevent shutdown for slightly longer, but avoid extra relocation if the allocation were already complete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this check down below and added a test for the differing behavior

Comment on lines +271 to +283
int statusOrd = -1;
for (Status status : statuses) {
// Max the status up to, but not including, "complete"
if (status != COMPLETE) {
statusOrd = Math.max(status.ordinal(), statusOrd);
}
}
if (statusOrd == -1) {
// Either all the statuses were complete, or there were no statuses given
return COMPLETE;
} else {
return Status.values()[statusOrd];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want to be a functional programming dork and tell you to use reduce here, but I think this is actually clearer than what you'd have to do to make reduce work.

Copy link
Contributor

@gwbrown gwbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left the tiniest nitpick, otherwise LGTM now that you've addressed Andrei's comments.

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Lee

/**
* Check with registered plugins whether the shutdown is safe for the given node id and type
*/
public boolean readyToShutdown(String nodeId, SingleNodeShutdownMetadata.Type shutdownType) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth having this method take an extra ClusterState parameter, and pass that as an extra argument to each of the plugin.safeToShutdown calls it makes. It will make it easier for plugins that don't currently have their own ClusterService reference to implement the interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed adding these today, but since there's no current user, we're going to keep the cluster state out of the interface for now, and revisit it when ML (or a different plugin) has an implementation where they need these

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool. I am going to work on the ML PR soon, so can add the arguments to that if you don't have a fundamental objection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so can add the arguments to that if you don't have a fundamental objection.

If possible, I think I'd prefer to keep them out of the interface. Especially for the signalShutdown method, if the cluster state is added it's essentially no different than a regular ClusterStateListener call. I think it's okay to add it to the readyToShutdown because that is a one-off call (not called on every cluster state change).

Would that work for you?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both methods are implemented by the same class, it doesn't really help to just add the current cluster state to one of them. I can instead add a reference to the ClusterService to the class that implements the interface, and then that can be used in both methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be my preference then, as long as that isn't too distasteful of a solution for you.

Set<String> shutdownNodes = shutdownNodes(state);
for (ShutdownAwarePlugin plugin : plugins) {
try {
plugin.signalShutdown(shutdownNodes);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, it would be nice if state was passed as an extra argument here, so that plugins that don't currently have their own reference to ClusterService can look at the current cluster state.

* Whether the plugin is considered safe to shut down. This method is called when the status of
* a shutdown is retrieved via the API, and it is only called on the master node.
*/
boolean safeToShutdown(String nodeId, SingleNodeShutdownMetadata.Type shutdownType);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider adding an extra ClusterState parameter to this method. I think almost every implementation of this will involve looking for something in the cluster state.

* A trigger to notify the plugin that a shutdown for the nodes has been triggered. This method
* will be called on every node for each cluster state, so it should return quickly.
*/
void signalShutdown(Collection<String> shutdownNodeIds);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider adding an extra ClusterState parameter to this method. I think almost every implementation of this will involve looking for something in the cluster state.

@dakrone dakrone merged commit 2bf2bdd into elastic:master Jun 7, 2021
@dakrone dakrone deleted the shutdown-proof-ilm branch June 7, 2021 14:39
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Jun 7, 2021
This commit makes ILM aware of different parts of the node shutdown lifecycle. It consists are two
main parts, reacting to the state during execution, and signaling the status of shutdown from ILM.

Reacting to shutdown state
ILM now considers nodes that are going to be shut down when deciding which node to assign for the
shrink action. It uses the `NodeShutdownAllocationDecider` within the `SetSingleNodeAllocateStep` to
not assign shards to a node that will be removed. If an index is already past this step and waiting
for allocation, this commit adds an `isCompletable` method to the
`ClusterStateWaitUntilThresholdStep` so that an allocation that cannot happen can be rewound and
retried on another (non-shutdown) node.

Signaling shutdown status
This commit introduces the `PluginShutdownService` which deals with `ShutdownAwarePlugin` classes.
This class is used to signal shutdowns to plugins, and also to gather the status of a shutdown from
these plugins. ILM implements this `ShutdownAwarePlugin` to signal if an index is in a step that is
unsafe, such as the actual shrink step, so that shutdown will wait until after the allocation rules
have been removed by ILM.

This commit also hooks up the get shutdown API response to consider the statuses of its parts (see
`SingleNodeShutdownMetadata.Status#combine`) when creating a response.

Relates to elastic#70338
dakrone added a commit that referenced this pull request Jun 7, 2021
* Make ILM aware of node shutdown (#73690)

This commit makes ILM aware of different parts of the node shutdown lifecycle. It consists are two
main parts, reacting to the state during execution, and signaling the status of shutdown from ILM.

Reacting to shutdown state
ILM now considers nodes that are going to be shut down when deciding which node to assign for the
shrink action. It uses the `NodeShutdownAllocationDecider` within the `SetSingleNodeAllocateStep` to
not assign shards to a node that will be removed. If an index is already past this step and waiting
for allocation, this commit adds an `isCompletable` method to the
`ClusterStateWaitUntilThresholdStep` so that an allocation that cannot happen can be rewound and
retried on another (non-shutdown) node.

Signaling shutdown status
This commit introduces the `PluginShutdownService` which deals with `ShutdownAwarePlugin` classes.
This class is used to signal shutdowns to plugins, and also to gather the status of a shutdown from
these plugins. ILM implements this `ShutdownAwarePlugin` to signal if an index is in a step that is
unsafe, such as the actual shrink step, so that shutdown will wait until after the allocation rules
have been removed by ILM.

This commit also hooks up the get shutdown API response to consider the statuses of its parts (see
`SingleNodeShutdownMetadata.Status#combine`) when creating a response.

Relates to #70338

* Adjust annotation

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown :Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Core/Infra Meta label for core/infra team Team:Data Management Meta label for data/management team v7.14.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants