Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

Today when updating the routing table (i.e. within AllocationService#reroute()) Elasticsearch computes the desired balance of shards and then identifies some shard movements that work towards that goal. At the end of the computation it discards the computed desired allocation and recomputes it the next time round. It's kind of inefficient to recompute the desired allocation each time, and it makes it hard to predict how long it will take until the goal is reached. The computation also happens on the critical path for cluster state updates.

With this commit we introduce a new allocator which keeps hold of the desired balance between iterations. It also computes the desired balance asynchronously, allowing other cluster state updates to happen while the computation is ongoing.

Relates #88647, #83777, and many more.

Today when updating the routing table (i.e. within
`AllocationService#reroute()`) Elasticsearch computes the desired
balance of shards and then identifies some shard movements that work
towards that goal. At the end of the computation it discards the
computed desired allocation and recomputes it the next time round. It's
kind of inefficient to recompute the desired allocation each time, and
it makes it hard to predict how long it will take until the goal is
reached. The computation also happens on the critical path for cluster
state updates.

With this commit we introduce a new allocator which keeps hold of the
desired balance between iterations. It also computes the desired balance
asynchronously, allowing other cluster state updates to happen while the
computation is ongoing.

Relates elastic#88647, elastic#83777, and many more.
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.6.0 labels Nov 7, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 7, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner mentioned this pull request Nov 7, 2022
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focused on the primary changes in reconciler and computer, left a number of comments, most of which can be deferred to follow-ups.

if (o1.primary() ^ o2.primary()) {
return o1.primary() ? -1 : 1;
}
if (o1.getIndexName().compareTo(o2.getIndexName()) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more specific:

Suggested change
if (o1.getIndexName().compareTo(o2.getIndexName()) == 0) {
if (o1.getIndex().equals(o2.getIndex()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from BalancedShardsAllocator so we'd want to change it in both places. Not doing that here, but tracking this in #91386.

}

public void completeAllAsNotMaster() {
completedIndex = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unsafe to me, as in if advance and completeAllAsNotMaster runs on different threads, we risk advance setting completedIndex to an index after we set it to -1 here?

I am not exactly sure why we reset the indexGenerator in DesiredBalanceShardsAllocator, could it not continue where it left in case the node becomes master again? That would avoid the reset here, simplifying I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #91386. @idegtiarenko could you take a look at this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important co complete the listeners so that we do not have stuck requests if the node is no longer master.
I also think it is worth resetting the desired balance to empty/initial case as there could be a lot of changes to the routing table by the time the node is elected as a master again.I guess it is fine not to reset the index here (the one in desired balance allocator should not be resetted as well)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments, all optional related to the merge of this (but would like to then see addressed in follow-ups, though not necessarily immediately).

Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read all the production code and this makes sense to me. This was mostly to familiarize myself with the changes as I didn't have enough time to review it thoroughly.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@DaveCTurner
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/bwc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement release highlight Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants