Aggregations: bucket_sort pipeline aggregation #27152

dimitris-athanasiou · 2017-10-27T16:37:30Z

This commit adds a parent pipeline aggregation that allows
sorting the buckets of a parent multi-bucket aggregation.

The aggregation also offers [from] and [size] parameters
in order to truncate the result as desired.

dimitris-athanasiou · 2017-10-27T16:38:07Z

Note I'll come back and add the docs for this before merging the PR. Just wanted enough feedback to ensure this is going the right direction.

colings86

I left a couple of comments about added JavaDocs but otherwise I think this is good so far, just needs documentation added

colings86 · 2017-10-30T07:56:27Z

.../org/elasticsearch/search/aggregations/pipeline/bucketsort/BucketSortPipelineAggregator.java

Could you add a JavaDoc to explain what this is for?

colings86 · 2017-10-30T07:59:06Z

...sticsearch/search/aggregations/pipeline/bucketsort/BucketSortPipelineAggregationBuilder.java

Could you add a JavaDoc explaining what this aggregation does?

dimitris-athanasiou · 2017-10-30T14:38:34Z

@colings86 I have now also added the doc page for bucket_sort. Could you take a look at that when you can?

polyfractal · 2017-11-06T19:39:44Z

...sticsearch/search/aggregations/pipeline/bucketsort/BucketSortPipelineAggregationBuilder.java

Do we need some validation (after parsing) that one of size or sort is defined? E.g. in case a user does:

{ "bucket_sort": { "sort": [] } }

Hm, not sure. If the user specifies neither, it will just be a no-op. What is the consistent thing to do here?

Hrm, not sure either. I think I'd prefer it to throw an exception, but I could see the alternative argument too. E.g. your application assembles the queries and it omits both parameters in some situations, so you'd prefer it to not throw exceptions. Don't have a strong opinion on this one.

@colings86 any thoughts? Dimitris and I chatted about this on zoom and thought we'd see what you had to say :)

I think we should throw an exception. It seems like a clear mistake if the user does this and I'm not sure it would actually be a no-op because it would output a complete copy of the target aggregation in the response as well as the original?

It would be no-op because this is sorting inline.

ah yes, sorry. I'd still like to throw an exception though as this is almost certainly a mistake by the user so they should be notified so they can correct it

polyfractal · 2017-11-06T19:40:20Z

.../org/elasticsearch/search/aggregations/pipeline/bucketsort/BucketSortPipelineAggregator.java

Pipelines are only called on the final reduce, so this check isn't needed. I was a bit confused about it too (the javadocs for isFinalReduce() is sorta misleading) and had to bug Colin :)

You can see that the incremental reduction passes null for pipelines in SearchPhaseController: https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java#L518

So I don't think this check is needed.

Ah, fair enough! Will remove.

polyfractal · 2017-11-06T19:40:38Z

.../org/elasticsearch/search/aggregations/pipeline/bucketsort/BucketSortPipelineAggregator.java

If you want, and totally up to personal preference, I think truncate's functionality here can be done with a sublist?

new ArrayList<>(buckets.subList(offset, offset + currentSize)) or similar?

But that's just preference, doesn't bother me either way :)

Definitely agree with your preference :-)

polyfractal · 2017-11-06T19:41:22Z

.../org/elasticsearch/search/aggregations/pipeline/bucketsort/BucketSortPipelineAggregator.java

I may be reading this wrong, but if a bucket is skipped I think the final list will be incorrectly sized? E.g.

resultSize == size == 5

bucketCount == 10

Bucket 2 is skipped

The conditional will prevent Bucket 2 from being added to the list, but the loop will decrement regardless and it'll only add four values to newBuckets, rather than grabbing Bucket 6 for a total of five results.

Would it be easier to just resolve the bucket value up front and if it's skipped, avoid adding it to the priority queue entirely? Also see the note below in ComparableBucket regarding resolving the values

polyfractal · 2017-11-06T19:42:22Z

.../org/elasticsearch/search/aggregations/pipeline/bucketsort/BucketSortPipelineAggregator.java

I wonder if we should just preemptively cache the bucket's value (and set skip) in the ctor for the ComparableBucket? resolveBucketValue() isn't terribly expensive or anything, but since this happens during comparison, every push/pop onto the priority queue's heap will invoke a bunch of comparisons and re-resolving these values.

The downside is that the cached values may not be needed if we're only sorting on keys. Dunno, thoughts?

Might be affected by how/if the above skipping thing is actually an issue or not.

polyfractal · 2017-11-06T19:43:27Z

core/src/test/java/org/elasticsearch/search/aggregations/pipeline/bucketsort/BucketSortIT.java

Re: size + skipping issue from above, might be good to have a test for that scenario where an explicit size is set (which is less than the available buckets but spans across a skip).

Skips make everything more complicated :(

polyfractal · 2017-11-06T19:44:47Z

docs/reference/aggregations/pipeline/bucket-sort-aggregation.asciidoc

"after all other sibling aggs" <-- potentially confusing since there are sibling pipeline aggs.

Maybe something like "... after all other bucket and metric aggregations"? Or "... after all non-pipeline aggregations"?

polyfractal · 2017-11-06T19:46:11Z

docs/reference/aggregations/pipeline/bucket-sort-aggregation.asciidoc

Perhaps a section of the docs (or just a note, somewhere) which mentions you can use bucket_sort without any sorts, as a way to limit the number of buckets? It's kinda implied since the docs say sort is optional, but maybe someone will overlook that? It's such a useful part of the feature I'd hate people to miss it :)

Added an example

polyfractal · 2017-11-06T19:48:55Z

Left some comments, mostly little things. Only potentially thorny issue is how skips are handled / ComparableBucket resolving. But I may totally have misread that section of the code, so feel free to correct me :)

Otherwise I think this looks really good, and all the tests are lovely! ❤️ :)

This commit adds a parent pipeline aggregation that allows sorting the buckets of a parent multi-bucket aggregation. The aggregation also offers [from] and [size] parameters in order to truncate the result as desired. Closes elastic#14928

dimitris-athanasiou · 2017-11-07T20:02:03Z

@polyfractal I have pushed a commit that addresses all feedback. Let me know what you think when you have time.

polyfractal · 2017-11-09T16:25:58Z

LGTM! :)

* master: (22 commits) Update Tika version to 1.15 Aggregations: bucket_sort pipeline aggregation (#27152) Introduce templating support to timezone/locale in DateProcessor (#27089) Increase logging on qa:mixed-cluster tests Update to AWS SDK 1.11.223 (#27278) Improve error message for parse failures of completion fields (#27297) Ensure external refreshes will also refresh internal searcher to minimize segment creation (#27253) Remove optimisations to reuse objects when applying a new `ClusterState` (#27317) Decouple `ChannelFactory` from Tcp classes (#27286) Fix find remote when building BWC Remove colons from task and configuration names Add unreleased 5.6.5 version number testCreateSplitIndexToN: do not set `routing_partition_size` to >= `number_of_routing_shards` Snapshot/Restore: better handle incorrect chunk_size settings in FS repo (#26844) Add limits for ngram and shingle settings (#27211) (#27318) Correct comment in index shard test Roll translog generation on primary promotion ObjectParser: Replace IllegalStateException with ParsingException (#27302) scripted_metric _agg parameter disappears if params are provided (#27159) Update discovery-ec2.asciidoc ...

This commit adds a parent pipeline aggregation that allows sorting the buckets of a parent multi-bucket aggregation. The aggregation also offers [from] and [size] parameters in order to truncate the result as desired. Closes #14928

m-luqman · 2020-03-23T15:34:51Z

...src/main/java/org/elasticsearch/search/aggregations/pipeline/PipelineAggregatorBuilders.java

        return new BucketSelectorPipelineAggregationBuilder(name, script, bucketsPaths);
    }

+    public static BucketSortPipelineAggregationBuilder bucketSort(String name, List<FieldSortBuilder> sorts) {


Why is field sort builder specified, instead of a more flexible sort builder ?

Because it can only refer to a bucket path in the parent aggregation.

dimitris-athanasiou added :Analytics/Aggregations Aggregations >feature review v6.1.0 v7.0.0 labels Oct 27, 2017

dimitris-athanasiou requested a review from colings86 October 27, 2017 16:37

colings86 reviewed Oct 30, 2017

View reviewed changes

dimitris-athanasiou force-pushed the aggregations-bucket-sort-pipeline branch from 9fffe25 to d31fd0e Compare October 30, 2017 15:26

dimitris-athanasiou added the >docs General docs changes label Oct 30, 2017

$polyfractal$

polyfractal reviewed Nov 6, 2017

View reviewed changes

dimitris-athanasiou added 4 commits November 7, 2017 19:48

Aggregations: bucket_sort pipeline aggregation

f2d7b74

This commit adds a parent pipeline aggregation that allows sorting the buckets of a parent multi-bucket aggregation. The aggregation also offers [from] and [size] parameters in order to truncate the result as desired. Closes elastic#14928

Add missing javadoc

ba4ff0e

Add docs

0382e6a

Address review feedback

65ba372

dimitris-athanasiou force-pushed the aggregations-bucket-sort-pipeline branch from d31fd0e to 65ba372 Compare November 7, 2017 20:00

dimitris-athanasiou added 3 commits November 8, 2017 10:13

Remove redundant final qualifier from method

da24a43

Compare buckets via their keys

de24bf9

Make clearer the truncate only example

f022e2b

$polyfractal$

polyfractal approved these changes Nov 9, 2017

View reviewed changes

clintongormley removed the >docs General docs changes label Nov 9, 2017

dimitris-athanasiou merged commit 66bef26 into elastic:master Nov 9, 2017

dimitris-athanasiou added backport pending and removed v6.1.0 labels Nov 9, 2017

dimitris-athanasiou deleted the aggregations-bucket-sort-pipeline branch November 9, 2017 18:00

dimitris-athanasiou added v6.1.0 and removed backport pending labels Nov 10, 2017

colings86 mentioned this pull request Mar 19, 2018

Aggregation to select top N buckets in a parent multi-bucket aggregation #21135

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

m-luqman reviewed Mar 23, 2020

View reviewed changes

Aggregations: bucket_sort pipeline aggregation #27152

Aggregations: bucket_sort pipeline aggregation #27152

Uh oh!

Conversation

dimitris-athanasiou commented Oct 27, 2017

Uh oh!

dimitris-athanasiou commented Oct 27, 2017

Uh oh!

colings86 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou commented Oct 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polyfractal commented Nov 6, 2017

Uh oh!

dimitris-athanasiou commented Nov 7, 2017

Uh oh!

polyfractal commented Nov 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants