Skip to content

Disk-based shard allocation does not take into account freshly-relocating shards #45177

@DaveCTurner

Description

@DaveCTurner

Disk-based shard allocation may decide to allocate a shard to a node as long as that node has enough free space for that shard on its emptiest data path, accounting for the sizes of shards that are already relocating to that data path on that node. However, it does not currently account for shards that have been chosen for relocation to a node but whose assignment to a data path has not yet been observed by the ClusterInfoService. In particular, it ignores relocations triggered earlier on in the same reroute process, because these relocations have not been published to the cluster so their effects cannot have been observed.

This is particularly problematic if cluster.routing.allocation.node_concurrent_recoveries is set to a high value (e.g. 10): if the shards are large enough then this may cause the target node to breach one or more watermarks, possibly requiring further allocations away from the node to fix.

Metadata

Metadata

Assignees

Labels

:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)>bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions