-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Open
Labels
:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)MetaTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Team:Distributed CoordinationMeta label for Distributed Coordination teamMeta label for Distributed Coordination team
Description
Allocator behavior
- Exclude shards from desired balance that can not stay on the node they are currently residing nor move anywhere else. This does not hurt anything at the moment but could be very surprising to see a node id in a desired set if any of the deciders return NO for it.
- Minimize shard movements when balancing the cluster during compute phase.
- ensure
ClusterInfoSimulatoris not diverging a lot form real ClusterInfo after shards are relocated as accumulating error could result in a poor assignments during computations - Address apparent preference to concentrate large shards on some nodes and small shards on others
Stats
- Compute fraction of shards allocated on fallback nodes. High value indicate the cluster is ignoring assignments. This will result in more future shard movements.
- Measure average time between routing table changes such as indices additions, deletions and settings change that require a shard movement. Measure average time to move the shard
- convert DesiredBalanceShardsAllocator metrics to a proper apm metrics so that they could be observed over the time
- Desired-balance warn threshold logging should accumulate across restarts #100850
API improvements
- Add forecasts (ingest and disk) to
/_cat/allocationand node stats api (Add index forecasts to /_cat/allocation output #97561) - Expose desired nodes in
/_cluster/allocation/explainapi - Report both node id and node name in
/_internal/desired_balancefor current and desired nodes. - Expose balancing metrics over node stats api (Expose allocation balance weights via endpoint #92097)
Other
- Introduce desired-balance allocator #91343 (comment): extract common code from
ResizeAllocationDecider#canAllocateandResizeAllocationDecider#getForcedInitialShardAllocationToNodes. - Introduce desired-balance allocator #91343 (comment): confirm whether it's enough to rely on the invariants of
RoutingNodesto protect against assigning multiple copies of a shard to a node. - Introduce desired-balance allocator #91343 (comment): confirm whether the new comment is sufficient
- Introduce desired-balance allocator #91343 (comment): is
findLatestdoing the right thing? - Introduce desired-balance allocator #91343 (comment): is
PendingListenersQueue#completeAllAsNotMaster()safe? (Assert PendingListenersQueue completed by master #91428) - Introduce desired-balance allocator #91343 (comment):
ContinuousComputationhas lacklustre rejection handling (Tidy ContinuousComputation rejection handling #91442) - Introduce desired-balance allocator #91343 (comment): make explicit the assumption that
onNewInputis called in order with increasing indices (Ensure desired balance computations run in order #91443) - Introduce desired-balance allocator #91343 (comment): do we want to bail out on all empty balances or just
INITIAL? - Introduce desired-balance allocator #91343 (comment): should we also use
canAllocate(shard, allocation)to short-circuit cases where a shard cannot be assigned anywhere? - Possible improvements to code copied from existing implementation:
- Introduce desired-balance allocator #91343 (comment): should
failAllocationOfNewPrimariescheck the recovery source? - Introduce desired-balance allocator #91343 (comment): use
!=instead of^. - Introduce desired-balance allocator #91343 (comment):
.compareTo() == 0vs.equals() - Introduce desired-balance allocator #91343 (comment): comparator efficiency
- Introduce desired-balance allocator #91343 (comment): should
- Update
wait_for_no_initializing_shardsandwait_for_no_relocating_shardsto not exit immediately if there is an ongoing desired balance computation as it might trigger shards initializing or relocating. - run computation with limited
cluster_concurrent_rebalanceto avoid unnecessary shard movements during the desired balance computation Simulate shard moves using cluster_concurrent_rebalance=2 #93977 - cleanup desired balance once a new master is elected (Reset desired balance on no longer master #95450)
- be able to manually reset or recompute desired balance from scratch (Reset desired balance #94525)
- automatically detect (and log) if desired balance started to deviate from the current state too much (by a configured fraction of shards) (Warn if allocation diverged from the desired allocation #95458 else from assigned nodes in desired balance
- node shutdown may be stuck if desired balance computation computes additional moves (B -> C) after node replacement move (A -> B) as NodeReplacementAllocationDecider would not permit direct move (A -> C) (Weaken node-replacement decider during reconciliation #95070)
- in case desired balance is diverged from current balance a lot, prioritize shard movements that would improve the balance (away from fuller nodes to emptier nodes) (Balance priorities during reconciliation #95454)
a03nikki and insukcho
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)MetaTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Team:Distributed CoordinationMeta label for Distributed Coordination teamMeta label for Distributed Coordination team