-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Today DataTiersUsageTransportAction executes an internal nodes stats action with all the trimmings:
Lines 77 to 81 in c006d10
| client.admin() | |
| .cluster() | |
| .prepareNodesStats() | |
| .all() | |
| .setIndices(CommonStatsFlags.ALL) |
In a large cluster this implementation may need hundreds of MiB of heap on the coordinating node to hold onto every statistic about every shard on every node (several kiB per shard) even though we use almost none of them. Worse, the coordinating node is always the elected master because that's how XPackUsageFeatureTransportAction derivatives work. It also burns a bunch of CPU and network bandwidth just transporting these stats around the cluster. AFAICT we could push this computation out to the individual nodes with a dedicated TransportNodesAction which computes the tiny TierSpecificStats on each node in a manner that allows the coordinating node to combine them.
It also does not propagate cancellation down to the nodes stats task (addressed in #100253)
It also captures the cluster state when it's initiated and retains it until completion, which can represent another 100MiB+ of heap usage.
Relates #77466.