-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Closed
Copy link
Labels
:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)>enhancementSupportabilityImprove our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Team:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Team:Distributed CoordinationMeta label for Distributed Coordination teamMeta label for Distributed Coordination team
Description
Today we emit periodic INFO logs about an ongoing desired balance computation which has not converged after some amount of time or some number of iterations:
Lines 321 to 329 in 18f960c
| logger.log( | |
| reportByIterationCount || reportByTime ? Level.INFO : i % 100 == 0 ? Level.DEBUG : Level.TRACE, | |
| () -> Strings.format( | |
| "Desired balance computation for [%d] is still not converged after [%s] and [%d] iterations", | |
| desiredBalanceInput.index(), | |
| TimeValue.timeValueMillis(currentTime - computationStartedTime).toString(), | |
| iterations | |
| ) | |
| ); |
However these numbers reset if a new cluster state is received, so it's possible for a steady stream of cluster states to prevent the computation for converging without ever seeing any warnings. IMO we should let these numbers accumulate until the computation fully converges, and report the number of restarts since convergence in the log message too.
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)>enhancementSupportabilityImprove our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Team:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Team:Distributed CoordinationMeta label for Distributed Coordination teamMeta label for Distributed Coordination team