Skip to content

Improve metric query performance #95776

@martijnvg

Description

@martijnvg

There are a number of performance issues that have been found in production cluster for metric solutions that need to be addressed in order to have competitive query latency in the metric space. This is part of the tsdb effort as it aim is to make Elasticsearch better at storing and querying metric data. Tasks mentioned here are improvements that significantly reduce query time of many metric query workloads or specific ones.

Our current observations indicate that the poor performance is caused by the default refresh behaviour. Shards by default go search-idle after 30 seconds of search inactivity. When a shard is queries that is search idle then a refresh is performed as part of the search and then search execution continues. This adds a significant amount of latency to the query time. Especially because the refresh isn't triggered, but awaits until the scheduled refresh kicks in (which means often for 1 second nothing happens).

Additionally we observed that any search with a percentile aggregation is slow. Under the hood the percentile aggregation uses avl t-digest to compute the percentiles. This shows up as significant hotspot when profiling.

Metadata

Metadata

Assignees

Labels

:StorageEngine/TSDBYou know, for MetricsMetaTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions