-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
I would like to discuss adding a multivalued metrics aggregation that will apply unpaired and paired two-sample t-tests to two samples selected based on filters or fields or a combination of both.
So, unpaired t-test might look like this:
GET logs/_search
{
"size": 0,
"aggs" : {
"test" : {
"t_test" : {
"filters" : [
{ "match" : { "group" : "A" }},
{ "match" : { "group" : "B" }}
],
"field": "value"
}
}
}
}
The paired t-test might look something like this:
GET logs/_search
{
"size": 0,
"aggs" : {
"test" : {
"t_test" : {
"fields" : ["before", "after"]
}
}
}
}
We can also add support for scripts.
The type of the test can be specified by the user with defaults based on the presence of absence of filters. We can support a type parameter that can be specified as paired (default and only supported if filters are not present), homoscedastic (equal variance) or heteroscedastic (unequal variance, default if filters are present.
The output will be a typical metrics aggregation with t and p values.
Alternatively, we can implement this as a pipeline aggregation, but in this case it will simplify implementation, but might make usage a bit more difficult and can complicate kibana adoption. We can also consider implementing it as both pipeline and metric aggregation similar to stats.
cc: @jtibshirani, @polyfractal