Weaken node-replacement decider during reconciliation #95070

DaveCTurner · 2023-04-06T08:59:26Z

The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location.

This commit weakens this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.

The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location. This commit suppresses this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.

elasticsearchmachine · 2023-04-06T08:59:50Z

Hi @DaveCTurner, I've created a changelog YAML for you.

elasticsearchmachine · 2023-04-06T08:59:50Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2023-04-06T09:01:02Z

.../src/internalClusterTest/java/org/elasticsearch/xpack/shutdown/DesiredBalanceShutdownIT.java

            INDEX,
            Settings.builder()
-                .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, 1)
+                .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, between(1, 5))


This is the relevant test change: with more than 1 shard, there are enough other nodes in the cluster for the eventual desired balance to spread them out across multiple nodes even though the node replacement decider doesn't like that.

DaveCTurner · 2023-04-06T09:01:47Z

.../src/internalClusterTest/java/org/elasticsearch/xpack/shutdown/DesiredBalanceShutdownIT.java

                    .allMatch(s -> s.overallStatus() == SingleNodeShutdownMetadata.Status.COMPLETE)
            );
-        });
+        }, 120, TimeUnit.SECONDS);


This may be unnecessary but 30s seems like it might be a little short to relocate 5 shards if the CI worker is running egregiously slowly.

NIT: should we make it dependent on the shard count?

not really worth it IMO but I can if you insist :)

DaveCTurner · 2023-04-06T09:02:29Z

...a/org/elasticsearch/cluster/routing/allocation/decider/NodeReplacementAllocationDecider.java

+        if (allocation.isReconciling()) {
+            return YES__RECONCILING;


These two lines are the relevant fix here, everything else is just tidying up the use of Decision constants.

This seems fairly permitting condition. I wonder if there any tricky situations when we could permit moving shard to the node that is shutting down? Maybe while the balance is still being computed for the node shutdown?

The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location. This commit weakens this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.

elasticsearchmachine · 2023-04-06T11:41:52Z

💚 Backport successful

Status	Branch	Result
✅	8.7

The node-replacement allocation decider requires shard movements to follow a specific route, from source to replacement target. However during the shutdown there may be other changes in the system that make the replacement target unsuitable for the final destination of the shard. Having simulated the move of the shard onto the replacement target we are free to simulate its movement elsewhere, but today this causes the reconciler to get stuck: it cannot move the shard to its desired location because of the ongoing replacement, and it will not move the shard onto the replacement target because that's not its desired location. This commit weakens this decider during reconciliation which allows the reconcilier to skip the intermediate target node and move the shard straight to its desired location.

DaveCTurner added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.7.1 v8.8.0 labels Apr 6, 2023

DaveCTurner requested review from idegtiarenko and pxsalehi April 6, 2023 08:59

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Apr 6, 2023

Update docs/changelog/95070.yaml

22b547f

DaveCTurner commented Apr 6, 2023

View reviewed changes

DaveCTurner added 4 commits April 6, 2023 10:51

Only permit reconcilations away from source, not onto target

27114b8

Also permit relocations onto the target, but not new allocations there

915d147

Comment

475753d

Comment

dff6a2b

idegtiarenko mentioned this pull request Apr 6, 2023

Follow-up work for desired balance allocator #91386

Open

33 tasks

idegtiarenko approved these changes Apr 6, 2023

View reviewed changes

DaveCTurner changed the title ~~Ignore node-replacement decider during reconciliation~~ Weaken node-replacement decider during reconciliation Apr 6, 2023

DaveCTurner added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Apr 6, 2023

Update changelog

83d3d46

DaveCTurner removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 6, 2023

DaveCTurner merged commit 6d006f4 into elastic:main Apr 6, 2023

DaveCTurner deleted the 2023-04-06-ignore-replace-decider-during-reconciliation branch April 6, 2023 11:40

DaveCTurner mentioned this pull request Apr 6, 2023

[8.7] Weaken node-replacement decider during reconciliation (#95070) #95075

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weaken node-replacement decider during reconciliation #95070

Weaken node-replacement decider during reconciliation #95070

Uh oh!

DaveCTurner commented Apr 6, 2023 •

edited

Loading

Uh oh!

elasticsearchmachine commented Apr 6, 2023

Uh oh!

elasticsearchmachine commented Apr 6, 2023

Uh oh!

DaveCTurner Apr 6, 2023

Uh oh!

DaveCTurner Apr 6, 2023

Uh oh!

idegtiarenko Apr 6, 2023

Uh oh!

DaveCTurner Apr 6, 2023

Uh oh!

DaveCTurner Apr 6, 2023

Uh oh!

idegtiarenko Apr 6, 2023

Uh oh!

elasticsearchmachine commented Apr 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Weaken node-replacement decider during reconciliation #95070

Weaken node-replacement decider during reconciliation #95070

Uh oh!

Conversation

DaveCTurner commented Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 6, 2023

Uh oh!

elasticsearchmachine commented Apr 6, 2023

Uh oh!

DaveCTurner Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Apr 6, 2023

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DaveCTurner commented Apr 6, 2023 •

edited

Loading