Skip to content

Commit bde99ba

Browse files
committed
Use a static default precision for the cardinality aggregation. #19215
Today the default precision for the cardinality aggregation depends on how many parent bucket aggregations it had. The reasoning was that the more parent bucket aggregations, the more buckets the cardinality had to be computed on. And this number could be huge depending on what the parent aggregations actually are. However now that we run terms aggregations in breadth-first mode by default when there are sub aggregations, it is less likely that we have to run the cardinality aggregation on kagilions of buckets. So we could use a static default, which will be less confusing to users.
1 parent 9ededa4 commit bde99ba

File tree

2 files changed

+7
-27
lines changed

2 files changed

+7
-27
lines changed

core/src/main/java/org/elasticsearch/search/aggregations/metrics/cardinality/CardinalityAggregatorFactory.java

Lines changed: 6 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
import org.elasticsearch.search.aggregations.AggregatorFactories;
2424
import org.elasticsearch.search.aggregations.AggregatorFactory;
2525
import org.elasticsearch.search.aggregations.InternalAggregation.Type;
26-
import org.elasticsearch.search.aggregations.bucket.SingleBucketAggregator;
2726
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
2827
import org.elasticsearch.search.aggregations.support.AggregationContext;
2928
import org.elasticsearch.search.aggregations.support.ValuesSource;
@@ -48,36 +47,19 @@ public CardinalityAggregatorFactory(String name, Type type, ValuesSourceConfig<V
4847
@Override
4948
protected Aggregator createUnmapped(Aggregator parent, List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData)
5049
throws IOException {
51-
return new CardinalityAggregator(name, null, precision(parent), context, parent, pipelineAggregators, metaData);
50+
return new CardinalityAggregator(name, null, precision(), context, parent, pipelineAggregators, metaData);
5251
}
5352

5453
@Override
5554
protected Aggregator doCreateInternal(ValuesSource valuesSource, Aggregator parent, boolean collectsFromSingleBucket,
5655
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
57-
return new CardinalityAggregator(name, valuesSource, precision(parent), context, parent, pipelineAggregators,
56+
return new CardinalityAggregator(name, valuesSource, precision(), context, parent, pipelineAggregators,
5857
metaData);
5958
}
6059

61-
private int precision(Aggregator parent) {
62-
return precisionThreshold == null ? defaultPrecision(parent) : HyperLogLogPlusPlus.precisionFromThreshold(precisionThreshold);
63-
}
64-
65-
/*
66-
* If one of the parent aggregators is a MULTI_BUCKET one, we might want to lower the precision
67-
* because otherwise it might be memory-intensive. On the other hand, for top-level aggregators
68-
* we try to focus on accuracy.
69-
*/
70-
private static int defaultPrecision(Aggregator parent) {
71-
int precision = HyperLogLogPlusPlus.DEFAULT_PRECISION;
72-
while (parent != null) {
73-
if (parent instanceof SingleBucketAggregator == false) {
74-
// if the parent creates buckets, we subtract 5 to the precision,
75-
// which will effectively divide the memory usage of each counter by 32
76-
precision -= 5;
77-
}
78-
parent = parent.parent();
79-
}
80-
81-
return Math.max(precision, HyperLogLogPlusPlus.MIN_PRECISION);
60+
private int precision() {
61+
return precisionThreshold == null
62+
? HyperLogLogPlusPlus.DEFAULT_PRECISION
63+
: HyperLogLogPlusPlus.precisionFromThreshold(precisionThreshold);
8264
}
8365
}

docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,7 @@ experimental[The `precision_threshold` option is specific to the current interna
4545
defines a unique count below which counts are expected to be close to
4646
accurate. Above this value, counts might become a bit more fuzzy. The maximum
4747
supported value is 40000, thresholds above this number will have the same
48-
effect as a threshold of 40000.
49-
Default value depends on the number of parent aggregations that multiple
50-
create buckets (such as terms or histograms).
48+
effect as a threshold of 40000. The default values is +3000+.
5149

5250
==== Counts are approximate
5351

0 commit comments

Comments
 (0)