-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Make ILM aware of node shutdown #73690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit makes ILM aware of different parts of the node shutdown lifecycle. It consists are two main parts, reacting to the state during execution, and signaling the status of shutdown from ILM. Reacting to shutdown state ILM now considers nodes that are going to be shut down when deciding which node to assign for the shrink action. It uses the `NodeShutdownAllocationDecider` within the `SetSingleNodeAllocateStep` to not assign shards to a node that will be removed. If an index is already past this step and waiting for allocation, this commit adds an `isCompletable` method to the `ClusterStateWaitUntilThresholdStep` so that an allocation that cannot happen can be rewound and retried on another (non-shutdown) node. Signaling shutdown status This commit introduces the `PluginShutdownService` which deals with `ShutdownAwarePlugin` classes. This class is used to signal shutdowns to plugins, and also to gather the status of a shutdown from these plugins. ILM implements this `ShutdownAwarePlugin` to signal if an index is in a step that is unsafe, such as the actual shrink step, so that shutdown will wait until after the allocation rules have been removed by ILM. This commit also hooks up the get shutdown API response to consider the statuses of its parts (see `SingleNodeShutdownMetadata.Status#combine`) when creating a response. Relates to elastic#70338
|
Pinging @elastic/es-core-features (Team:Core/Features) |
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample |
andreidan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Lee.
This generally looks great, I had one question.
| .map(nmm -> nmm.get(idShardsShouldBeOn)) | ||
| .map(snsm -> snsm.getType() == SingleNodeShutdownMetadata.Type.REMOVE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a personal preference so feel free to ignore, but I find it very difficult to read nmn, snsm, and c below in IndexLifecycleService
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've renamed these to hopefully be better
| if (nodeBeingRemoved) { | ||
| completable = false; | ||
| return new Result(false, new SingleMessageFieldInfo("node with id [" + idShardsShouldBeOn + | ||
| "] is currently marked as shutting down for removal")); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move this check below, and only execute it if we did NOT already relocate all the necessary shards to the target node here ?
If we're ready to execute shrink should the shutdown wait for the shrink action/task to finish? IndexLifecycleService already signals that shutdown should not be executed if we're in this step (in readyToShutdown)
As opposed to allowing the shutdown to continue and then re-doing the shard allocation to another node? (given DTS issues or, if in the same zone, generally moving GBs of data in the cluster could be avoided this way, but maybe it's impractical from the node shutdown infrastructure perspective?)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I would say they are both around the same, so maybe it's better to do the check afterwards. This would mean that we'd prevent shutdown for slightly longer, but avoid extra relocation if the allocation were already complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this check down below and added a test for the differing behavior
| int statusOrd = -1; | ||
| for (Status status : statuses) { | ||
| // Max the status up to, but not including, "complete" | ||
| if (status != COMPLETE) { | ||
| statusOrd = Math.max(status.ordinal(), statusOrd); | ||
| } | ||
| } | ||
| if (statusOrd == -1) { | ||
| // Either all the statuses were complete, or there were no statuses given | ||
| return COMPLETE; | ||
| } else { | ||
| return Status.values()[statusOrd]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want to be a functional programming dork and tell you to use reduce here, but I think this is actually clearer than what you'd have to do to make reduce work.
gwbrown
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left the tiniest nitpick, otherwise LGTM now that you've addressed Andrei's comments.
x-pack/plugin/ilm/src/test/java/org/elasticsearch/xpack/ilm/IndexLifecycleServiceTests.java
Outdated
Show resolved
Hide resolved
andreidan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Lee
| /** | ||
| * Check with registered plugins whether the shutdown is safe for the given node id and type | ||
| */ | ||
| public boolean readyToShutdown(String nodeId, SingleNodeShutdownMetadata.Type shutdownType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth having this method take an extra ClusterState parameter, and pass that as an extra argument to each of the plugin.safeToShutdown calls it makes. It will make it easier for plugins that don't currently have their own ClusterService reference to implement the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed adding these today, but since there's no current user, we're going to keep the cluster state out of the interface for now, and revisit it when ML (or a different plugin) has an implementation where they need these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool. I am going to work on the ML PR soon, so can add the arguments to that if you don't have a fundamental objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so can add the arguments to that if you don't have a fundamental objection.
If possible, I think I'd prefer to keep them out of the interface. Especially for the signalShutdown method, if the cluster state is added it's essentially no different than a regular ClusterStateListener call. I think it's okay to add it to the readyToShutdown because that is a one-off call (not called on every cluster state change).
Would that work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since both methods are implemented by the same class, it doesn't really help to just add the current cluster state to one of them. I can instead add a reference to the ClusterService to the class that implements the interface, and then that can be used in both methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be my preference then, as long as that isn't too distasteful of a solution for you.
| Set<String> shutdownNodes = shutdownNodes(state); | ||
| for (ShutdownAwarePlugin plugin : plugins) { | ||
| try { | ||
| plugin.signalShutdown(shutdownNodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, it would be nice if state was passed as an extra argument here, so that plugins that don't currently have their own reference to ClusterService can look at the current cluster state.
| * Whether the plugin is considered safe to shut down. This method is called when the status of | ||
| * a shutdown is retrieved via the API, and it is only called on the master node. | ||
| */ | ||
| boolean safeToShutdown(String nodeId, SingleNodeShutdownMetadata.Type shutdownType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding an extra ClusterState parameter to this method. I think almost every implementation of this will involve looking for something in the cluster state.
| * A trigger to notify the plugin that a shutdown for the nodes has been triggered. This method | ||
| * will be called on every node for each cluster state, so it should return quickly. | ||
| */ | ||
| void signalShutdown(Collection<String> shutdownNodeIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding an extra ClusterState parameter to this method. I think almost every implementation of this will involve looking for something in the cluster state.
This commit makes ILM aware of different parts of the node shutdown lifecycle. It consists are two main parts, reacting to the state during execution, and signaling the status of shutdown from ILM. Reacting to shutdown state ILM now considers nodes that are going to be shut down when deciding which node to assign for the shrink action. It uses the `NodeShutdownAllocationDecider` within the `SetSingleNodeAllocateStep` to not assign shards to a node that will be removed. If an index is already past this step and waiting for allocation, this commit adds an `isCompletable` method to the `ClusterStateWaitUntilThresholdStep` so that an allocation that cannot happen can be rewound and retried on another (non-shutdown) node. Signaling shutdown status This commit introduces the `PluginShutdownService` which deals with `ShutdownAwarePlugin` classes. This class is used to signal shutdowns to plugins, and also to gather the status of a shutdown from these plugins. ILM implements this `ShutdownAwarePlugin` to signal if an index is in a step that is unsafe, such as the actual shrink step, so that shutdown will wait until after the allocation rules have been removed by ILM. This commit also hooks up the get shutdown API response to consider the statuses of its parts (see `SingleNodeShutdownMetadata.Status#combine`) when creating a response. Relates to elastic#70338
* Make ILM aware of node shutdown (#73690) This commit makes ILM aware of different parts of the node shutdown lifecycle. It consists are two main parts, reacting to the state during execution, and signaling the status of shutdown from ILM. Reacting to shutdown state ILM now considers nodes that are going to be shut down when deciding which node to assign for the shrink action. It uses the `NodeShutdownAllocationDecider` within the `SetSingleNodeAllocateStep` to not assign shards to a node that will be removed. If an index is already past this step and waiting for allocation, this commit adds an `isCompletable` method to the `ClusterStateWaitUntilThresholdStep` so that an allocation that cannot happen can be rewound and retried on another (non-shutdown) node. Signaling shutdown status This commit introduces the `PluginShutdownService` which deals with `ShutdownAwarePlugin` classes. This class is used to signal shutdowns to plugins, and also to gather the status of a shutdown from these plugins. ILM implements this `ShutdownAwarePlugin` to signal if an index is in a step that is unsafe, such as the actual shrink step, so that shutdown will wait until after the allocation rules have been removed by ILM. This commit also hooks up the get shutdown API response to consider the statuses of its parts (see `SingleNodeShutdownMetadata.Status#combine`) when creating a response. Relates to #70338 * Adjust annotation Co-authored-by: Elastic Machine <[email protected]>
This commit makes ILM aware of different parts of the node shutdown lifecycle. It consists are two
main parts, reacting to the state during execution, and signaling the status of shutdown from ILM.
Reacting to shutdown state
ILM now considers nodes that are going to be shut down when deciding which node to assign for the
shrink action. It uses the
NodeShutdownAllocationDeciderwithin theSetSingleNodeAllocateSteptonot assign shards to a node that will be removed. If an index is already past this step and waiting
for allocation, this commit adds an
isCompletablemethod to theClusterStateWaitUntilThresholdStepso that an allocation that cannot happen can be rewound andretried on another (non-shutdown) node.
Signaling shutdown status
This commit introduces the
PluginShutdownServicewhich deals withShutdownAwarePluginclasses.This class is used to signal shutdowns to plugins, and also to gather the status of a shutdown from
these plugins. ILM implements this
ShutdownAwarePluginto signal if an index is in a step that isunsafe, such as the actual shrink step, so that shutdown will wait until after the allocation rules
have been removed by ILM.
This commit also hooks up the get shutdown API response to consider the statuses of its parts (see
SingleNodeShutdownMetadata.Status#combine) when creating a response.Relates to #70338