Skip to content

Add Student's t-test aggregation support #53692

@imotov

Description

@imotov

I would like to discuss adding a multivalued metrics aggregation that will apply unpaired and paired two-sample t-tests to two samples selected based on filters or fields or a combination of both.

So, unpaired t-test might look like this:

GET logs/_search
{
  "size": 0,
  "aggs" : {
    "test" : {
      "t_test" : {
        "filters" : [
          { "match" : { "group" : "A" }},
          { "match" : { "group" : "B" }}
        ],
        "field": "value"
      }
    }
  }
}

The paired t-test might look something like this:

GET logs/_search
{
  "size": 0,
  "aggs" : {
    "test" : {
      "t_test" : {
        "fields" : ["before", "after"]
      }
    }
  }
}

We can also add support for scripts.

The type of the test can be specified by the user with defaults based on the presence of absence of filters. We can support a type parameter that can be specified as paired (default and only supported if filters are not present), homoscedastic (equal variance) or heteroscedastic (unequal variance, default if filters are present.

The output will be a typical metrics aggregation with t and p values.

Alternatively, we can implement this as a pipeline aggregation, but in this case it will simplify implementation, but might make usage a bit more difficult and can complicate kibana adoption. We can also consider implementing it as both pipeline and metric aggregation similar to stats.

cc: @jtibshirani, @polyfractal

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions