-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Skip Rollover step if next index already exists #35168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If the Rollover step would fail due to the next index in sequence already existing, just skip to the next step instead of going to the Error step. This prevents spurious `ResourceAlreadyExistsException`s created by simultaneous RolloverStep executions from causing ILM to error out unnecessarily.
|
Pinging @elastic/es-core-infra |
| logger.info(secondIndex + ": " + getStepKeyForIndex(secondIndex)); | ||
| assertThat(getStepKeyForIndex(originalIndex), equalTo(new StepKey("hot", RolloverAction.NAME, ErrorStep.NAME))); | ||
| assertThat(getFailedStepForIndex(originalIndex), equalTo("update-rollover-lifecycle-date")); | ||
| assertThat(getReasonForIndex(originalIndex), equalTo("index [" + originalIndex + "] has not rolled over yet")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this error message is confusing. Just to check I understand correctly. The problem with the second index is actually that it doesn't have the rollover alias setting set and doesn't have any rollover info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example: If you have test-000001 and manually create test-000002 before rollover fires (and don't do anything else), then test-000001 will end up in the error state with this message because test-000001 has not been rolled over, and therefore doesn't have any attached RolloverInfo. test-000001 is still the target of the alias and still has the rollover alias setting.
If test-000002 had been manually created and the alias had been manually switched to point to test-000002, attempt_rollover would fail due to #35065
Does that help clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just change the message in this step to say something like:
No rollover info found for index [INDEX_NAME]. Either the index has not yet rolled over or a subsequent index was created outside of Index Lifecycle Management.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/indexlifecycle/RolloverStep.java
Outdated
Show resolved
Hide resolved
|
CI failure seems unrelated and does not reproduce locally. @elasticmachine test this please |
If the Rollover step would fail due to the next index in sequence already existing, just skip to the next step instead of going to the Error step. This prevents spurious `ResourceAlreadyExistsException`s created by simultaneous RolloverStep executions from causing ILM to error out unnecessarily.
|
Whoops, just realized I got mixed up and this didn't actually get any official approvals from reviewers - my bad. Let me know if there were more changes you wanted me to make. |
If the Rollover step would fail due to the next index in sequence
already existing, just skip to the next step instead of going to the
Error step.
This prevents spurious
ResourceAlreadyExistsExceptions created bysimultaneous RolloverStep executions from causing ILM to error out
unnecessarily.
Resolves #34465 - I tested this fix manually against a setup that otherwise
reliably reproduced that issue for a few hours, and did not encounter any
failures due to an unnecessary
ResourceAlreadyExistsException.However, I'm still somewhat concerned that the error message a user encounters
if they really do have a problem with the index existing isn't clear - see the test case
I added (
testRolloverAlreadyExists) to see what I mean. We could change thatmessage to be clearer in this case, but that may cause the error to be less clear if
the
RolloverInfoisnullvia some other way. Any thoughts?