Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

Today there is a node-level canAllocate override which the balancer
uses to ignore certain nodes to which it is certain no more shards can
be allocated. In fact this override only ignores nodes which have hit
the rarely-used cluster.routing.allocation.total_shards_per_node
limit, so this optimization doesn't have a meaningful impact on real
clusters.

This commit removes this unnecessary fast path from the balancer, and
also removes all the machinery needed to support it.

Today there is a node-level `canAllocate` override which the balancer
uses to ignore certain nodes to which it is certain no more shards can
be allocated. In fact this override only ignores nodes which have hit
the rarely-used `cluster.routing.allocation.total_shards_per_node`
limit, so this optimization doesn't have a meaningful impact on real
clusters.

This commit removes this unnecessary fast path from the balancer, and
also removes all the machinery needed to support it.
@DaveCTurner DaveCTurner added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >refactoring team-discuss v8.0.0 v7.9.0 labels Jul 13, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Allocation)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 13, 2020
@DaveCTurner
Copy link
Contributor Author

@zuketo do we have any data that can show how rarely cluster.routing.allocation.total_shards_per_node is actually used? I couldn't see an obvious way to determine that from any telemetry.

@zuketo
Copy link

zuketo commented Jul 14, 2020

I couldn't find any good data sources for this (other than adding to telemetry). This setting is also not whitelisted by cloud, so no data points there. Could we deprecate first and then look at removal?

@DaveCTurner
Copy link
Contributor Author

Thanks for confirming, Jason. TBC we're not talking about removing the setting itself, only the 100 lines of code that treats this setting as a special case in the shard allocator. Let's see what the team discussion brings.

@DaveCTurner
Copy link
Contributor Author

Absent a better source of data I looked through as many user interactions as I could find and only encountered 7 mentions of this setting in the last 90 days, and I don't think any of them would have benefitted from this optimisation. We discussed this today as well and agreed to proceed.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DaveCTurner DaveCTurner merged commit c1274a4 into elastic:master Jul 23, 2020
@DaveCTurner DaveCTurner deleted the 2020-07-13-remove-node-level-canAllocate branch July 23, 2020 07:48
@DaveCTurner DaveCTurner restored the 2020-07-13-remove-node-level-canAllocate branch July 23, 2020 07:48
@DaveCTurner DaveCTurner deleted the 2020-07-13-remove-node-level-canAllocate branch July 23, 2020 07:48
DaveCTurner added a commit that referenced this pull request Jul 23, 2020
Today there is a node-level `canAllocate` override which the balancer
uses to ignore certain nodes to which it is certain no more shards can
be allocated. In fact this override only ignores nodes which have hit
the rarely-used `cluster.routing.allocation.total_shards_per_node`
limit, so this optimization doesn't have a meaningful impact on real
clusters.

This commit removes this unnecessary fast path from the balancer, and
also removes all the machinery needed to support it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >refactoring Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants