Skip to content

Feature request: Aggregation to produce buckets with a fixed number of documents in them #50120

@IceCreamYou

Description

@IceCreamYou

I would like to create an NP chart and I can't find a way to do so with ElasticSearch currently.

An NP chart is a line chart in which the value of each point is a percentage of a fixed number of items that meets some criteria. For example, if my data looks like this:

[
	{ date: '2019-01-01', failed: true },
	{ date: '2019-01-01', failed: true },
	{ date: '2019-01-02', failed: false },
	{ date: '2019-01-04', failed: false },
	{ date: '2019-01-05', failed: false },
	{ date: '2019-01-08', failed: true },
	{ date: '2019-01-08', failed: false },
	{ date: '2019-01-08', failed: false },
]

Then I want to write a histogram like this:

aggs: {
	np_chart: {
		fixed_size_buckets: {
			max_bucket_count: 10,
			max_bucket_documents: 3,
			sort: [{
				date: {
					order: 'asc',
				},
			}],
		},
		aggs: {
			failed_count: {
				filter: {
					term: {
						'failed': true,
					},
				},
			},
		},
	},
},

Which should return buckets like this:

[
	{
		key: ...,
		key_as_string: '2019-01-01',
		doc_count: 3,
		failed_count: { doc_count: 2 },
	},
	{
		key: ...,
		key_as_string: '2019-01-04',
		doc_count: 3,
		failed_count: { doc_count: 1 },
	},
	{
		key: ...,
		key_as_string: '2019-01-08',
		doc_count: 2,
		failed_count: { doc_count: 0 },
	},
]

Obviously if there are this few documents I could load the documents into memory and parse them manually, but I'd like to have up to a few thousand documents per bucket and that's too much to process that way.

There's a variant of the NP chart where instead of dividing all the documents into groups of N, we first make a date histogram and then take a random sample of N documents from each day. The proposal above would support both cases.

I think this is different than other requests for variable width histograms I've been able to find but please correct me if this has been proposed elsewhere.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions