Skip to content

Increase default value for cluster.routing.allocation.cluster_concurrent_rebalance #97750

@idegtiarenko

Description

@idegtiarenko

Description

cluster.routing.allocation.cluster_concurrent_rebalance property is limiting the amount of shards that could be rebalanced simultaneously. The default value is 2 what is reasonable for a small amount of shards however it is becoming a bottleneck for a bigger clusters (10+ nodes).

Since new desired balance shard allocator is not affected by #87279 (effectively resolved by #93977) I believe we should change the default to allow big clusters to rebalance quicker.

The new default could be set to:

  • 10 (or any other higher arbitrary number). This will not resolve the issue completely but will move the bottleneck a little further
  • Make it dependent on the cluster size (for example allow 1 concurrent rebalance per every 2 nodes in cluster ro introduce a new setting such as cluster.routing.allocation.node_concurrent_recoveries_per_node). This approach will allow to scale the number with the cluster size
  • -1 (or unlimited). This way the bottleneck would be defined by amount of incomming/outgoing recoveries the node could sustain: cluster.routing.allocation.node_concurrent_incoming_recoveries / cluster.routing.allocation.node_concurrent_outgoing_recoveries. This is the most aggresive option and it may delay the necessary shard movements (such as hot->warm tier migration) due to already ongoing rebalances.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)>enhancementTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions