-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
Description
The precision_threshold parameter of the cardinality aggregation not only has an impact on accuracy but also on memory usage. This is why by default we decide how much memory a cardinality aggregation may use depending on how deep it can be found in the aggregation tree. For instance a top-level cardinality aggregation would use 16KB of memory, a cardinality aggregation under a terms aggregation would use 512 bytes per bucket, and a cardinality aggregation under two (or more) levels of terms aggregation would use 16 bytes per bucket.
Unfortunately, it's not easy to get precise counts with only 16 bytes of memory, which can make the out-of-the-box experience a bit disappointing. I think we have several (non-exclusive) options here:
- increase default memory usage, but I'm nervous about making it even easier to trigger circuit-breaking errors or worse out-of-memory errors. Maybe Define good heuristics to use
collect_mode: breadth_first#9825 could help here: we could decide to always run terms aggs in breadth-first mode if there is a cardinality agg under them so that the cardinality aggregation would be computed on fewer buckets - better document these defaults
- move parts of the aggs computation to disk so that we can increase our defaults more safely