-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Weaken node-replacement decider during reconciliation #95070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weaken node-replacement decider during reconciliation #95070
Conversation
The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location. This commit suppresses this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.
|
Hi @DaveCTurner, I've created a changelog YAML for you. |
|
Pinging @elastic/es-distributed (Team:Distributed) |
| INDEX, | ||
| Settings.builder() | ||
| .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, 1) | ||
| .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, between(1, 5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the relevant test change: with more than 1 shard, there are enough other nodes in the cluster for the eventual desired balance to spread them out across multiple nodes even though the node replacement decider doesn't like that.
| .allMatch(s -> s.overallStatus() == SingleNodeShutdownMetadata.Status.COMPLETE) | ||
| ); | ||
| }); | ||
| }, 120, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be unnecessary but 30s seems like it might be a little short to relocate 5 shards if the CI worker is running egregiously slowly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: should we make it dependent on the shard count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really worth it IMO but I can if you insist :)
| if (allocation.isReconciling()) { | ||
| return YES__RECONCILING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are the relevant fix here, everything else is just tidying up the use of Decision constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fairly permitting condition. I wonder if there any tricky situations when we could permit moving shard to the node that is shutting down? Maybe while the balance is still being computed for the node shutdown?
The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location. This commit weakens this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.
💚 Backport successful
|
The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location. This commit weakens this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.
The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location.
This commit weakens this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.