-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Introduce desired-balance allocator #91343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce desired-balance allocator #91343
Conversation
Today when updating the routing table (i.e. within `AllocationService#reroute()`) Elasticsearch computes the desired balance of shards and then identifies some shard movements that work towards that goal. At the end of the computation it discards the computed desired allocation and recomputes it the next time round. It's kind of inefficient to recompute the desired allocation each time, and it makes it hard to predict how long it will take until the goal is reached. The computation also happens on the critical path for cluster state updates. With this commit we introduce a new allocator which keeps hold of the desired balance between iterations. It also computes the desired balance asynchronously, allowing other cluster state updates to happen while the computation is ongoing. Relates elastic#88647, elastic#83777, and many more.
|
Hi @DaveCTurner, I've created a changelog YAML for you. |
|
Pinging @elastic/es-distributed (Team:Distributed) |
server/src/internalClusterTest/java/org/elasticsearch/index/store/CorruptedFileIT.java
Outdated
Show resolved
Hide resolved
.../src/internalClusterTest/java/org/elasticsearch/cluster/coordination/RareClusterStateIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/ClusterModule.java
Outdated
Show resolved
Hide resolved
henningandersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focused on the primary changes in reconciler and computer, left a number of comments, most of which can be deferred to follow-ups.
...er/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/AllocationDecider.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/cluster/routing/allocation/decider/ResizeAllocationDecider.java
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceInput.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/ClusterInfoSimulator.java
Outdated
Show resolved
Hide resolved
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java
Outdated
Show resolved
Hide resolved
| if (o1.primary() ^ o2.primary()) { | ||
| return o1.primary() ? -1 : 1; | ||
| } | ||
| if (o1.getIndexName().compareTo(o2.getIndexName()) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more specific:
| if (o1.getIndexName().compareTo(o2.getIndexName()) == 0) { | |
| if (o1.getIndex().equals(o2.getIndex()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from BalancedShardsAllocator so we'd want to change it in both places. Not doing that here, but tracking this in #91386.
...in/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceReconciler.java
Show resolved
Hide resolved
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/NodeAllocationOrdering.java
Show resolved
Hide resolved
.../main/java/org/elasticsearch/cluster/routing/allocation/allocator/PendingListenersQueue.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| public void completeAllAsNotMaster() { | ||
| completedIndex = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unsafe to me, as in if advance and completeAllAsNotMaster runs on different threads, we risk advance setting completedIndex to an index after we set it to -1 here?
I am not exactly sure why we reset the indexGenerator in DesiredBalanceShardsAllocator, could it not continue where it left in case the node becomes master again? That would avoid the reset here, simplifying I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in #91386. @idegtiarenko could you take a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important co complete the listeners so that we do not have stuck requests if the node is no longer master.
I also think it is worth resetting the desired balance to empty/initial case as there could be a lot of changes to the routing table by the time the node is elected as a master again.I guess it is fine not to reset the index here (the one in desired balance allocator should not be resetted as well)
henningandersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments, all optional related to the merge of this (but would like to then see addressed in follow-ups, though not necessarily immediately).
.../main/java/org/elasticsearch/cluster/routing/allocation/allocator/ContinuousComputation.java
Show resolved
Hide resolved
...va/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalance.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceReconciler.java
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceReconciler.java
Show resolved
Hide resolved
fcofdez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read all the production code and this makes sense to me. This was mostly to familiarize myself with the changes as I didn't have enough time to review it thoroughly.
henningandersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
@elasticmachine please run elasticsearch-ci/bwc |
Today when updating the routing table (i.e. within
AllocationService#reroute()) Elasticsearch computes the desired balance of shards and then identifies some shard movements that work towards that goal. At the end of the computation it discards the computed desired allocation and recomputes it the next time round. It's kind of inefficient to recompute the desired allocation each time, and it makes it hard to predict how long it will take until the goal is reached. The computation also happens on the critical path for cluster state updates.With this commit we introduce a new allocator which keeps hold of the desired balance between iterations. It also computes the desired balance asynchronously, allowing other cluster state updates to happen while the computation is ongoing.
Relates #88647, #83777, and many more.