Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Apr 6, 2023

The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location.

This commit weakens this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.

The node-replacement allocation decider requires shard movements to
follow a specific route, from source to replacement target. However
during the shutdown there may be other changes in the system that make
the replacement target unsuitable for the final destination of the
shard. Having simulated the move of the shard onto the replacement
target we are free to simulate its movement elsewhere, but today this
causes the reconciler to get stuck: it cannot move the shard to its
desired location because of the ongoing replacement, and it will not
move the shard onto the replacement target because that's not its
desired location.

This commit suppresses this decider during reconciliation which allows
the reconcilier to skip the intermediate target node and move the shard
straight to its desired location.
@DaveCTurner DaveCTurner added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.7.1 v8.8.0 labels Apr 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 6, 2023
INDEX,
Settings.builder()
.put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, 1)
.put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, between(1, 5))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the relevant test change: with more than 1 shard, there are enough other nodes in the cluster for the eventual desired balance to spread them out across multiple nodes even though the node replacement decider doesn't like that.

.allMatch(s -> s.overallStatus() == SingleNodeShutdownMetadata.Status.COMPLETE)
);
});
}, 120, TimeUnit.SECONDS);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be unnecessary but 30s seems like it might be a little short to relocate 5 shards if the CI worker is running egregiously slowly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: should we make it dependent on the shard count?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really worth it IMO but I can if you insist :)

Comment on lines 36 to 37
if (allocation.isReconciling()) {
return YES__RECONCILING;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are the relevant fix here, everything else is just tidying up the use of Decision constants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fairly permitting condition. I wonder if there any tricky situations when we could permit moving shard to the node that is shutting down? Maybe while the balance is still being computed for the node shutdown?

@DaveCTurner DaveCTurner changed the title Ignore node-replacement decider during reconciliation Weaken node-replacement decider during reconciliation Apr 6, 2023
@DaveCTurner DaveCTurner added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Apr 6, 2023
@DaveCTurner DaveCTurner removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 6, 2023
@DaveCTurner DaveCTurner merged commit 6d006f4 into elastic:main Apr 6, 2023
@DaveCTurner DaveCTurner deleted the 2023-04-06-ignore-replace-decider-during-reconciliation branch April 6, 2023 11:40
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Apr 6, 2023
The node-replacement allocation decider requires shard movements to
follow a specific route, from source to replacement target. However
during the shutdown there may be other changes in the system that make
the replacement target unsuitable for the final destination of the
shard. Having simulated the move of the shard onto the replacement
target we are free to simulate its movement elsewhere, but today this
causes the reconciler to get stuck: it cannot move the shard to its
desired location because of the ongoing replacement, and it will not
move the shard onto the replacement target because that's not its
desired location.

This commit weakens this decider during reconciliation which allows
the reconcilier to skip the intermediate target node and move the shard
straight to its desired location.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.7

elasticsearchmachine pushed a commit that referenced this pull request Apr 6, 2023
The node-replacement allocation decider requires shard movements to
follow a specific route, from source to replacement target. However
during the shutdown there may be other changes in the system that make
the replacement target unsuitable for the final destination of the
shard. Having simulated the move of the shard onto the replacement
target we are free to simulate its movement elsewhere, but today this
causes the reconciler to get stuck: it cannot move the shard to its
desired location because of the ongoing replacement, and it will not
move the shard onto the replacement target because that's not its
desired location.

This commit weakens this decider during reconciliation which allows
the reconcilier to skip the intermediate target node and move the shard
straight to its desired location.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.7.1 v8.8.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants