Skip to content

range_histogram and date_range_histogram aggregations to help analyse "session" duration type data #23182

@colings86

Description

@colings86

Now that (since 5.2) we support range field types I am wondering if we can use them to help users with the concurrent sessions problem (e.g. https://discuss.elastic.co/t/display-concurrency-in-data-on-kibana/26006/3)

The problem detailed in the post above is that the user is trying to determine, for each 30 second period, how many concurrent phone calls are occurring. This problem can be generalised to wanting to analysis how many concurrent 'sessions' are occurring over fixed intervals of time (or potentially some other unit for this axis). By 'session' here I mean something that has a start time and an end time, this could be phone calls, web sessions, calendar meetings/appointments.

The aggregation would work by adding each collected document to all the histogram buckets which fall into the range given by the value of the range field. Currently the range field does not write doc_values when indexing so we will either need to write doc_values or have a different way to retrieve the field values in a columnar way.

The following should be interpreted as thinking out loud and may or may not be useful:

For the non-date applications of this, one (possibly contrived) use-case could be in aggregated metric data. If I was taking temperature data for every weather station in the UK, I might have a document per day that would probably contain the mean and median temperature for the day but also minimum and maximum temperature for the day which I could store in a range field containing the range of temperatures reported that day. When I come to analyse the data one useful thing to see would be how many days the temperature was between -10C to 0C, 0C to 10C, 10C to 20C etc. I could use the range_histogram aggregation to get the answer to this question as it would tell me for each 10C interval how many days the temperature was recorded in the interval at some point in the day. Analysing the max and min temperature independently would only tell me the days when the maximum or minimum was in each interval which answers a slightly different question.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions