-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version (bin/elasticsearch --version): Observed on 6.7.2 and 7.0.1
Issue
I've observed multiple instances where RolloverInfo does not get attached to an index when it is rolled over. In one instance, the cluster was heavily overloaded and had errors processing cluster state updates, in another, a node was rapidly joining, getting removed, rejoining again, and repeating over and over. This may or may not be relevant.
This would normally not be problematic, but ILM relies on the RolloverInfo to update the lifecycle reference date following rollover, so when this occurs, ILM encounters an error with a message similar to this one:
[2019-10-28T05:50:10,915][ERROR][o.e.x.i.ExecuteStepsUpdateTask] [name] policy [my-policy] for index [my-index-000001] failed on cluster state step [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}]. Moving to ERROR step
The step info from the ILM Explain API when this happens:
"step_info" : { "type" : "illegal_state_exception", "reason" : "no rollover info found for [my-index-000001] with alias [my-alias], the index has not yet rolled over with that alias", "stack_trace" : "[omitted for brevity]" }
Workaround
There is no way to manually attach RolloverInfo, so if this happens to an index, ILM must be forcibly moved past this step. NOTE THAT THIS MEANS THE CREATION DATE OF THE INDEX WILL BE USED INSTEAD OF THE ROLLOVER DATE for all following phases. If this is problematic, then the index.lifecycle.origination_date setting in 7.5+ may be useful.
To do this, you can use the following request:
POST _ilm/move/my-index-000001
{
"current_step": {
"phase": "hot",
"action": "rollover",
"name": "ERROR"
},
"next_step": {
"phase": "hot",
"action": "rollover",
"name": "set-indexing-complete"
}
}