Skip to content

Aggregation to calculate the moving average on a histogram aggregation #10002

@polyfractal

Description

@polyfractal

This aggregation will calculate the moving average of sibling metrics in histogram-style data (histogram, date_histogram). Moving averages are useful when time series data is locally stationary and has a mean that changes slowly over time.

Seasonal data may need a different analysis, as well as data that is bimodal, "bursty" or contains frequent extreme values (which are not necessarily outliers).

The movavg aggregation supports several configurable options:

Window Size

The user specifies the window size they wish to calculate a moving average for. E.g. a user may want a 30-day sliding window over a histogram of 90 days total.

Currently, if there is not enough data to "fill" the window, the moving average will be calculated with whatever is available. For example, if a user selects 30-day window, days 1-29 will calculate the moving average with between 1-29 days of data.

We could investigate adding more "edge policies", which determine how to handle gaps at the edge of the moving average

Weighting Type

Currently, the agg supports four types of weighting:

  • simple: A simple (arithmetic) average. Default.
  • linear: A linearly weighted average, such that data becomes linearly less important as it gets "older" in the window
  • single_exp: Single exponentially weighted average (aka EWMA or Brown's Simple Exp Smoothing), such that data becomes exponentially less important as it get's "older".
  • double_exp: Double exponentially weighted average (aka Holt-Winters). Uses two exponential terms: first smooth data exponentially like single_exp, but then apply second corrective smoothing to account for a trend.

Todo: Expose alpha and beta

Alpha and beta are parameters which control the behavior of single_exp and double_exp.

  • Alpha: controls how far the single exponential smoothing term lags behind the "turning points" in the mean by 1/alpha periods. Alpha = 1 means the smoothing term has no memory (period of 1), and emulates a random walk. Alpha = 0 means the smoothing term has infinite memory and reports the mean of the data
  • Beta: Only used in double_exp. Analogous to alpha, but applied to the trend smoothing rather than the data smoothing.

Todo: Investigate metric-weighting

It's sometimes useful to weight a time period not by it's distance from the current time, but rather by some metric that happened in that time interval. E.g. weight by the volume of transactions that happened on that day.

It should be possible to weight based on metrics within the bucket, although it could get complicated if the value is missing.

Sample Request

This will calculate a moving average (sliding window of three days) over the sum of prices in each day:

GET /test/_search?search_type=count
{
   "aggs": {
      "my_date_histo": {
         "date_histogram": {
            "field": "@timestamp",
            "interval": "day"
         },
         "aggs": {
            "the_sum": {
               "sum": {
                  "field": "price"
               }
            },
            "the_movavg": {
               "movavg": {
                  "bucketsPath": "the_sum",
                  "window": 3
               }
            }
         }
      }
   }
}

Sample Response

{
   "took": 3,
   "timed_out": false,
   "aggregations": {
      "my_date_histo": {
         "buckets": [
            {
               "key_as_string": "2014-12-01T00:00:00.000Z",
               "key": 1417392000000,
               "doc_count": 1,
               "the_sum": {
                  "value": 1,
                  "value_as_string": "1.0"
               },
               "the_movavg": {
                  "value": 1
               }
            },
            {
               "key_as_string": "2014-12-02T00:00:00.000Z",
               "key": 1417478400000,
               "doc_count": 1,
               "the_sum": {
                  "value": 2,
                  "value_as_string": "2.0"
               },
               "the_movavg": {
                  "value": 1.5
               }
            },
            {
               "key_as_string": "2014-12-04T00:00:00.000Z",
               "key": 1417651200000,
               "doc_count": 1,
               "the_sum": {
                  "value": 4,
                  "value_as_string": "4.0"
               },
               "the_movavg": {
                  "value": 2.3333333333333335
               }
            },
            {
               "key_as_string": "2014-12-05T00:00:00.000Z",
               "key": 1417737600000,
               "doc_count": 1,
               "the_sum": {
                  "value": 5,
                  "value_as_string": "5.0"
               },
               "the_movavg": {
                  "value": 3.6666666666666665
               }
            },
            {
               "key_as_string": "2014-12-08T00:00:00.000Z",
               "key": 1417996800000,
               "doc_count": 1,
               "the_sum": {
                  "value": 8,
                  "value_as_string": "8.0"
               },
               "the_movavg": {
                  "value": 5.666666666666667
               }
            },
            {
               "key_as_string": "2014-12-09T00:00:00.000Z",
               "key": 1418083200000,
               "doc_count": 1,
               "the_sum": {
                  "value": 9,
                  "value_as_string": "9.0"
               },
               "the_movavg": {
                  "value": 7.333333333333333
               }
            }
         ]
      }
   }
}

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions