-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ILM] Fix Move To Step API causing ILM to hang #34618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ILM] Fix Move To Step API causing ILM to hang #34618
Conversation
The Move To Step API now checks to see if the target step is an AsyncActionStep, and if so, runs it. Previously, AsyncActionSteps would only be run when they are entered by executing the previous step, so if an AsyncActionStep was entered via the Move To Step API, ILM would never touch that index again.
|
Pinging @elastic/es-core-infra |
| @Override | ||
| public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) { | ||
| IndexMetaData newIndexMetaData = newState.metaData().index(indexMetaData.getIndex()); | ||
| if (newIndexMetaData == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this occur due to batching of updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking with @DaveCTurner it seems like we won't have any batching here because batching occurs within the same instance of ClusterStateTaskExecutor and we don't implement batching ourselves here. However I think this check is still nice to have in case there are other factors at play.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be assert newIndexMetaData != null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like David's suggestion here, as it suggests this should never happen more strongly than an if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An assert is less strong though? because the check will not be done in production code, only in tests.
If we can envisage any scenarios where the newState passed to this method can be different to the state we returned in execute() then I think this whould stay as an if statement so we don't end up in a situation where we have a NPE thrown here because the index was deleted. IF we are confident that this kind of scenario should never occur and assert is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. I guess I don't see this hurting, so I won't block it, but it may be misleading to people new to the code to walk through state in which this may be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add something like assert false : "there should be no opportunity for the index to be deleted" inside the if - that way we can catch it in testing while still handling it in production if there's a case we missed. Does that sound reasonable, or is it too messy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should leave this with the if statement so we are protected against NPEs. If we also want to add an assert to catch things in tests then that fine but I think the protection against a NPE in production should remain
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a reply on your comment but this LGTM
| @Override | ||
| public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) { | ||
| IndexMetaData newIndexMetaData = newState.metaData().index(indexMetaData.getIndex()); | ||
| if (newIndexMetaData == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking with @DaveCTurner it seems like we won't have any batching here because batching occurs within the same instance of ClusterStateTaskExecutor and we don't implement batching ourselves here. However I think this check is still nice to have in case there are other factors at play.
|
@elasticmachine retest this please |
The Move To Step API now checks to see if the target step is an AsyncActionStep, and if so, runs it. Previously, AsyncActionSteps would only be run when they are entered by executing the previous step, so if an AsyncActionStep was entered via the Move To Step API, ILM would never touch that index again.
The Move To Step API now checks to see if the target step is an
AsyncActionStep, and if so, runs it.
AsyncActionSteps are otherwise only run when they are entered by
executing the previous step, rather than periodically or on cluster state
updates, so if an AsyncActionStep was entered via the Move To Step API, ILM
would never touch that index again.
Fixes #34294