Skip to content

Commit fee554b

Browse files
authored
[ML] ensure the fractions and bucket doc count have the same lengths in bucket_count_ks_test (#74624)
Originally, there was the thought that the fractions provided to the bucket_count_ks_test would need some special handling to ensure appropriate perturbation of the values. This way the KS-test would provide good information. But, this requirement was removed as the calculation was improved. Consequently, remnants of this requirement (like requiring the caller to provide appropriately perturbed fractions) are no longer necessary. This commit allows the fractions parameter provided in the bucket_count_ks_test to simply be the expected fraction of documents per bucket. Previously the caller would have to append 0 to the fractions parameter (undocumented, and thus troublesome).
1 parent a518b8f commit fee554b

File tree

2 files changed

+43
-1
lines changed

2 files changed

+43
-1
lines changed

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/aggs/kstest/BucketCountKSTestAggregator.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,9 @@ public InternalAggregation doReduce(Aggregations aggregations, InternalAggregati
255255
.limit(bucketsValue.getDocCounts().length - 1)
256256
.mapToDouble(Double::valueOf)
257257
).toArray()
258-
: this.fractions;
258+
// We prepend zero to the fractions as we prepend 0 to the doc counts and we want them to be the same length when
259+
// we create the monotonically increasing values for distribution comparison.
260+
: DoubleStream.concat(DoubleStream.of(0.0), Arrays.stream(this.fractions)).toArray();
259261
return new InternalKSTestAggregation(name(), metadata(), ksTest(fractions, bucketsValue, alternatives, samplingMethod));
260262
}
261263
}

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/bucket_count_ks_test_agg.yml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,46 @@ setup:
120120
- is_false: aggregations.good.buckets.2.bucket_ks.two_sided
121121

122122
---
123+
"Test bucket count ks test bucket agg simple with fractions":
124+
125+
- do:
126+
search:
127+
index: store
128+
body: >
129+
{
130+
"size": 0,
131+
"aggs": {
132+
"good": {
133+
"terms": {
134+
"field": "product",
135+
"size": 10
136+
},
137+
"aggs": {
138+
"ranged_cost": {
139+
"range": {
140+
"field": "cost",
141+
"ranges": [
142+
{"from": 200},
143+
{"from": 200, "to": 250},
144+
{"from": 250, "to": 300},
145+
{"from": 300}
146+
]
147+
}
148+
},
149+
"bucket_ks": {
150+
"bucket_count_ks_test": {
151+
"buckets_path": "ranged_cost>_count",
152+
"fractions": [0.25, 0.25, 0.25, 0.25]
153+
}
154+
}
155+
}
156+
}
157+
}
158+
}
159+
- is_true: aggregations.good.buckets.0.bucket_ks.less
160+
- is_true: aggregations.good.buckets.1.bucket_ks.greater
161+
- is_true: aggregations.good.buckets.2.bucket_ks.two_sided
162+
---
123163
"Test bucket count ks test with missing buckets_path":
124164

125165
- do:

0 commit comments

Comments
 (0)