-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Use List instead of priority queue for stable sorting in bucket sort aggregator #36748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use List instead of priority queue for stable sorting in bucket sort aggregator #36748
Conversation
|
Pinging @elastic/es-analytics-geo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style note, we prefer the more verbose negation (foo != true or foo == false) over the short form (!foo), because the short form is easy to misread or overlook. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change introduces a bug where when there is at least one sort field specified and the size parameter is set, the latter is ignored. Previously, the priority queue was ensuring we apply the size parameter of the bucket_sort aggregation. Now there is nothing applying the size restriction. Unfortunately, BucketSortIT has no proper test scenario that covers this. Before discussing the code further, it would be great if we add tests that capture the above. Also, we should definitely have a new test that checks sorting is stable.
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests! This is looking pretty good now! I left a couple more comments regarding simplifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to name the two bucket_sort aggs differently for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact we have to sort on reverse order seems a bit awkward here. What if we simplify? I think we can now change ComparableBucket.compareTo to ignore the sort order and behave like a compareTo promises to behave. Then in this line, we can choose to reverse order or not depending on the sort order. I think this will make it much cleaner. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. That would make it easier to see what's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, we can't ignore the sort order in ComparableBucket.compareTo since we may be sorting buckets on multiple fields (each with a different order). Still, we can make the method return what we would normally expect from a compareTo. Then we won't have to sort the list in reverse and the code will make more sense overall. I will push the new version soon.
Also, the current compareTo (and the tests) consider null to be greater than any other value. Out of curiosity, is this some standard assumption, at least when it comes to Elasticsearch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this and sorry for the delayed response. Holidays kicked in :-) Happy new year!
Elasticsearch is handling null in ordinary search/sort by putting those values at the end (or the beginning when the order is descending). It also lets the user change that behaviour via the missing parameter. Having said that, I would argue it is overkill to support this in the context of the bucket_sort aggregation.
Looking more into the code, it actually looks like ComparableBucket could never really get any null values. There are 3 scenarios:
- the sort field is
_key; keys will be non-null I believe gap_policyisskip; the bucket won't exist at allgap_policyisinsert_zeros; the value of the bucket will be0.0and will be sorted accordingly.
So, it most probably is dead code.
|
@elasticmachine test this please |
|
@chatzikalymnios Could you please rebase your PR on latest master? There were upstream changes that will cause CI to fail unless rebased. |
df62d1c to
e7733ce
Compare
|
@elasticmachine test this please |
|
run default distro tests |
|
run gradle build tests 1 |
|
run default distro tests |
|
@chatzikalymnios I'm afraid the latest rebase landed on another build which has a failing test that seems not related to this change. May I ask you to rebase one more time please? :-) |
The first test ensures the aggregator limits the number of returned buckets according to the size parameter. The second test ensures that the relative order of equal buckets is preserved.
…ming in BucketSortIT The compareTo method in ComparableBucket is changed to behave more like a compareTo is supposed to. This allows a list of ComparableBucket objects to be directly sorted in the desired order.
|
@dimitris-athanasiou Sure, no worries :) |
e7733ce to
49a5a4a
Compare
|
@elasticmachine test this please |
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thank you so much for the contribution @chatzikalymnios!
|
@polyfractal Which versions do you think we should merge this in? Should we aim |
|
Hm, I think we probably missed the boat on 6.6, especially since this is more of an enhancement than bugfix. Let's put it in 6.7+ Thanks @chatzikalymnios @dimitris-athanasiou! |
Update BucketSortPipelineAggregator to use a List and Collections.sort() for sorting instead of a priority queue. This preserves the order for equal values. Closes #36322.