-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Analytics/AggregationsAggregationsAggregations:Delivery/BuildBuild or test infrastructureBuild or test infrastructure>refactoringTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)Meta label for analytical engine team (ESQL/Aggs/Geo)Team:DeliveryMeta label for Delivery teamMeta label for Delivery team
Description
We plan to for the tdigest library.
There are two main reasons behind this choice:
- We would like to control semantic version and backward compatibility according to our definition. Right now, for instance, TDigest does not match our usage of semantic versioning when changing the library code and that makes upgrading quite challenging because exposes us to backward compatibility issues.
- We would like to change those libraries to use some specific Elasticsearch libraries/tools/frameworks such as BigArrays. Right now when running some aggregations (percentiles, boxplot,...) we experience OOMs due to large memory usage. Using BigArrays, for instance, would allow us to deal with OOMs using Circuit Breakers.
The immediate goal of this issue is to fork the library and then at a later stage enhance to forked library to make use of the big arrays infrastructure.
Currently t-digest version 3.2 is used. The current version is 3.3 We have been locked to the 3.2 version because of at least one breaking change (how p50 is computed). The plan is to fork from the latest commit and change the forked library such that the results it produces are similar to that of version 3.2.
Metadata
Metadata
Assignees
Labels
:Analytics/AggregationsAggregationsAggregations:Delivery/BuildBuild or test infrastructureBuild or test infrastructure>refactoringTeam:AnalyticsMeta label for analytical engine team (ESQL/Aggs/Geo)Meta label for analytical engine team (ESQL/Aggs/Geo)Team:DeliveryMeta label for Delivery teamMeta label for Delivery team