Skip to content

Conversation

@chatzikalymnios
Copy link
Contributor

Update BucketSortPipelineAggregator to use a List and Collections.sort() for sorting instead of a priority queue. This preserves the order for equal values. Closes #36322.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a style note, we prefer the more verbose negation (foo != true or foo == false) over the short form (!foo), because the short form is easy to misread or overlook. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a bug where when there is at least one sort field specified and the size parameter is set, the latter is ignored. Previously, the priority queue was ensuring we apply the size parameter of the bucket_sort aggregation. Now there is nothing applying the size restriction. Unfortunately, BucketSortIT has no proper test scenario that covers this. Before discussing the code further, it would be great if we add tests that capture the above. Also, we should definitely have a new test that checks sorting is stable.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests! This is looking pretty good now! I left a couple more comments regarding simplifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to name the two bucket_sort aggs differently for readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact we have to sort on reverse order seems a bit awkward here. What if we simplify? I think we can now change ComparableBucket.compareTo to ignore the sort order and behave like a compareTo promises to behave. Then in this line, we can choose to reverse order or not depending on the sort order. I think this will make it much cleaner. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. That would make it easier to see what's happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, we can't ignore the sort order in ComparableBucket.compareTo since we may be sorting buckets on multiple fields (each with a different order). Still, we can make the method return what we would normally expect from a compareTo. Then we won't have to sort the list in reverse and the code will make more sense overall. I will push the new version soon.

Also, the current compareTo (and the tests) consider null to be greater than any other value. Out of curiosity, is this some standard assumption, at least when it comes to Elasticsearch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this and sorry for the delayed response. Holidays kicked in :-) Happy new year!

Elasticsearch is handling null in ordinary search/sort by putting those values at the end (or the beginning when the order is descending). It also lets the user change that behaviour via the missing parameter. Having said that, I would argue it is overkill to support this in the context of the bucket_sort aggregation.

Looking more into the code, it actually looks like ComparableBucket could never really get any null values. There are 3 scenarios:

  • the sort field is _key; keys will be non-null I believe
  • gap_policy is skip; the bucket won't exist at all
  • gap_policy is insert_zeros; the value of the bucket will be 0.0 and will be sorted accordingly.

So, it most probably is dead code.

@dimitris-athanasiou
Copy link
Contributor

@elasticmachine test this please

@dimitris-athanasiou
Copy link
Contributor

@chatzikalymnios Could you please rebase your PR on latest master? There were upstream changes that will cause CI to fail unless rebased.

@dimitris-athanasiou
Copy link
Contributor

@elasticmachine test this please

@dimitris-athanasiou
Copy link
Contributor

run default distro tests

@dimitris-athanasiou
Copy link
Contributor

run gradle build tests 1

@dimitris-athanasiou
Copy link
Contributor

run default distro tests

@dimitris-athanasiou
Copy link
Contributor

@chatzikalymnios I'm afraid the latest rebase landed on another build which has a failing test that seems not related to this change. May I ask you to rebase one more time please? :-)

The first test ensures the aggregator limits the number of returned buckets according to the size parameter.
The second test ensures that the relative order of equal buckets is preserved.
…ming in BucketSortIT

The compareTo method in ComparableBucket is changed to behave more like a compareTo is supposed to.
This allows a list of ComparableBucket objects to be directly sorted in the desired order.
@chatzikalymnios
Copy link
Contributor Author

@dimitris-athanasiou Sure, no worries :)

@dimitris-athanasiou
Copy link
Contributor

@elasticmachine test this please

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you so much for the contribution @chatzikalymnios!

@dimitris-athanasiou
Copy link
Contributor

@polyfractal Which versions do you think we should merge this in? Should we aim 6.6.0+?

@polyfractal
Copy link
Contributor

Hm, I think we probably missed the boat on 6.6, especially since this is more of an enhancement than bugfix. Let's put it in 6.7+

Thanks @chatzikalymnios @dimitris-athanasiou!

@dimitris-athanasiou dimitris-athanasiou merged commit 85a603e into elastic:master Jan 9, 2019
dimitris-athanasiou pushed a commit that referenced this pull request Jan 9, 2019
…aggregator (#36748)

Update BucketSortPipelineAggregator to use a List and Collections.sort() for sorting instead of a priority queue. This preserves the order for equal values. Closes #36322.
@chatzikalymnios chatzikalymnios deleted the stable-bucket-sort-agg branch January 9, 2019 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants