Skip to content

[Feature Request] Aggregate values proportionally across buckets #37642

@j-white

Description

@j-white

Describe the feature:
We have a use case to generate high precision time series from Netflow records stored as documents in Elasticsearch.

Given a Netflow record with the following fields:

{
  "timestamp": 460,
  "netflow.first_switched": 100,
  "netflow.last_switched": 450,
  "netflow.bytes": 350
}

We would like to be able to generate time series with start=0, end=500, step=100, and have the following data points:

t=0, bytes=0
t=100, bytes=100
t=200, bytes=100
t=300, bytes=100
t=400, bytes=50
t=500, bytes=0

The flow record indicates that a total of 350 bytes were transferred between t=[100,450], and the resulting series splits these values proportionally across the different buckets based on the overlap.

In the case were many documents were being aggregated, the resulting buckets would contain the sum.

During some previous research, we could not find a way to achieve this with existing facilities, so we developed a custom aggregation plugin - see https://github.com/OpenNMS/elasticsearch-drift-plugin.

Some example queries for the plugin in practice look like: https://gist.github.com/j-white/5c188e5c56fea3f14ea99ab6c1280ceb#time-series-for-top-n-applications

We're ready to work on cleaning up the code and working on making the changes necessary to help get this upstream, but before we do this we'd would like to confirm:

  • Are these any existing facilities for achieving this already? Maybe something existing we missed, or something new since we last checked in 6.2.x.
  • Is there interest in having this functionality as part of the core?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions