Skip to content

Desired-balance warn threshold logging should accumulate across restarts #100850

@DaveCTurner

Description

@DaveCTurner

Today we emit periodic INFO logs about an ongoing desired balance computation which has not converged after some amount of time or some number of iterations:

logger.log(
reportByIterationCount || reportByTime ? Level.INFO : i % 100 == 0 ? Level.DEBUG : Level.TRACE,
() -> Strings.format(
"Desired balance computation for [%d] is still not converged after [%s] and [%d] iterations",
desiredBalanceInput.index(),
TimeValue.timeValueMillis(currentTime - computationStartedTime).toString(),
iterations
)
);

However these numbers reset if a new cluster state is received, so it's possible for a steady stream of cluster states to prevent the computation for converging without ever seeing any warnings. IMO we should let these numbers accumulate until the computation fully converges, and report the number of restarts since convergence in the log message too.

Metadata

Metadata

Labels

:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)>enhancementSupportabilityImprove our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Team:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Team:Distributed CoordinationMeta label for Distributed Coordination team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions