-
Notifications
You must be signed in to change notification settings - Fork 25.6k
stop throwing an IllegalStateException for unrecognized steps #32212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… runPolicy Since the reason for a step not being found in a registry may be due to staleness of the registry between it and the cluster state, we do not want to throw an IllegalStateException. Staleness is something that will be self-healing after follow-up applications of the cluster state updates, so this is a recoverable issue that should log a warning instead of throwing an exception Closes elastic#32181.
|
Pinging @elastic/es-core-infra |
| "current step for index [" + indexMetaData.getIndex().getName() + "] with policy [" + policy + "] is not recognized"); | ||
| // This may happen in the case that there is invalid ilm-step index settings or the stepRegistry is out of | ||
| // sync with the current cluster state | ||
| logger.warn("current step [" + getCurrentStepKey(indexSettings) + "] for index [" + indexMetaData.getIndex().getName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I love that we want to provide the step info here, I have mixed feelings about the assertions that exist inside of getCurrentStepKey. I think this should be changed to throwing IllegalStateException. what do you think @colings86?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assertions are fine because ILM is in full control of the phase, action and step settings (they are INTERNAL settings so can't be touched by a user and will soon be moved to a custom index metadata object further locking down access to them. Therefore, I don't think its necessary to check that if one is set all three are set in production code but its useful to have the check in testing to make sure we don't do something silly so assertions feel right to me.
| + LifecycleSettings.LIFECYCLE_SKIP + "== true"); | ||
| return; | ||
| } | ||
| Step currentStep = getCurrentStep(stepRegistry, policy, indexSettings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked closer at this. I missed something here.
This method calls PolicyStepsRegistry#getStep, which chooses to throw IllegalStateExceptions as well. Further investigation to see what the repercussions of changing that is necessary.
and tests need to be added to walk these paths as well. Current tests do not catch this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, should https://github.com/talevy/elasticsearch/blob/80ab7739ca8877e2036edfac724f6014fefb7e52/x-pack/plugin/index-lifecycle/src/main/java/org/elasticsearch/xpack/indexlifecycle/PolicyStepsRegistry.java#L101-L114 continue to throw an IllegalStateException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably need to make PolicyStepsRegistry.getStep() return null if the step is missing (in much the same way that a map returns null for a missing key. Then we'll have to check for null:
- here in
IndexLifecycleRunner.runPolicy()and if itsnulllog a warning and return - In
IndexLifecycleRunner.moveClusterStateToStep()where I think we should do the same as above - In
ExecuteStepsUpdateTask.execute()where we should log the warning if itsnullbut return the cluster state up to changing the current step so we don't lose all the progress made on the previous cluster state steps
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is what I was thinking as well. Just wanted to double check with you before I take that route
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@talevy I left a few comments
| "current step for index [" + indexMetaData.getIndex().getName() + "] with policy [" + policy + "] is not recognized"); | ||
| // This may happen in the case that there is invalid ilm-step index settings or the stepRegistry is out of | ||
| // sync with the current cluster state | ||
| logger.warn("current step [" + getCurrentStepKey(indexSettings) + "] for index [" + indexMetaData.getIndex().getName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assertions are fine because ILM is in full control of the phase, action and step settings (they are INTERNAL settings so can't be touched by a user and will soon be moved to a custom index metadata object further locking down access to them. Therefore, I don't think its necessary to check that if one is set all three are set in production code but its useful to have the check in testing to make sure we don't do something silly so assertions feel right to me.
| throw new IllegalArgumentException(e.getMessage()); | ||
| Step nextStep = stepRegistry.getStep(indexPolicySetting, nextStepKey); | ||
| if (nextStep == null) { | ||
| throw new IllegalArgumentException("step [" + nextStepKey + "] with policy [" + indexPolicySetting + "] does not exist"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why this still needs to throw an exception rather than just returning an unmodified cluster state? Also I know we threw an IllegalArgumentException before but it feels wrong to me since the user won't have supplied anything invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am cool with that.
I still feel like IllegalArgumentException should be thrown to users of TransportMoveToStepAction. So I will push up a basic exception for that, and it will know that nothing "right" happened since the clusterStates will be the same instance
| if (nextStep == null) { | ||
| throw new IllegalArgumentException("step [" + nextStepKey + "] with policy [" + indexPolicySetting + "] does not exist"); | ||
| // stepRegistry may not be up-to-date with latest policies/steps in cluster-state | ||
| return currentState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I had not looked closely enough at what this method is used for. It seems that its only used for the "move to step" and retry APIs so actually its probably ok for this method to throw an exception directly. I had thought before that we use this method for moving between steps normally.
I think we should throw directly here as the alternative of returning the same cluster state and then using that as an indication that the step is not recognised is a bit trappy. To ensure this method is kept only for API calls and not for normal execution maybe we can add a JavaDoc comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will revert the latest commit and add comment
…ner, but throw in action" This reverts commit c009641.
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
thanks! |
Since the reason for a step not being found in a registry may be due to staleness of the registry between it and the cluster state, we do not want to throw an IllegalStateException. Staleness is something that will be self-healing after follow-up applications of the cluster state updates, so this is a recoverable issue that should log a warning instead of throwing an exception Closes #32181.
Since the reason for a step not being found in a registry may be due to staleness of the
registry between it and the cluster state, we do not want to throw an IllegalStateException.
Staleness is something that will be self-healing after follow-up applications of the cluster state
updates, so this is a recoverable issue that should log a warning instead of throwing an exception
Closes #32181.