-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17616][SQL] Support a single distinct aggregate combined with a non-partial aggregate #15187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…gregate combined with a non-partial aggregate.
|
cc @JoshRosen @rxin |
|
Test build #65737 has finished for PR 15187 at commit
|
| checkRewrite(RewriteDistinctAggregates(input)) | ||
| } | ||
|
|
||
| test("multiple distinct groups without non-distinct aggregates") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean non-partial aggregates here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually mean that the test only contains distinct aggregates.
| .analyze | ||
| val rewrite = RewriteDistinctAggregates(input) | ||
| comparePlans(input, rewrite) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add single distinct group with aggregates that have partial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| val input = testRelation | ||
| .groupBy('a)(countDistinct('b, 'c), countDistinct('d), sum('e)) | ||
| .analyze | ||
| checkRewrite(RewriteDistinctAggregates(input)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a test with partials, and one without partials here ? (part of the same test(""))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
srinathshankar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #65784 has finished for PR 15187 at commit
|
|
Merging to master/2.0. Thanks for the review. |
…a non-partial aggregate
We currently cannot execute an aggregate that contains a single distinct aggregate function and an one or more non-partially plannable aggregate functions, for example:
```sql
select grp,
collect_list(col1),
count(distinct col2)
from tbl_a
group by 1
```
This is a regression from Spark 1.6. This is caused by the fact that the single distinct aggregation code path assumes that all aggregates can be planned in two phases (is partially aggregatable). This PR works around this issue by triggering the `RewriteDistinctAggregates` in such cases (this is similar to the approach taken in 1.6).
Created `RewriteDistinctAggregatesSuite` which checks if the aggregates with distinct aggregate functions get rewritten into two `Aggregates` and an `Expand`. Added a regression test to `DataFrameAggregateSuite`.
Author: Herman van Hovell <[email protected]>
Closes #15187 from hvanhovell/SPARK-17616.
(cherry picked from commit 0d63487)
Signed-off-by: Herman van Hovell <[email protected]>
What changes were proposed in this pull request?
We currently cannot execute an aggregate that contains a single distinct aggregate function and an one or more non-partially plannable aggregate functions, for example:
This is a regression from Spark 1.6. This is caused by the fact that the single distinct aggregation code path assumes that all aggregates can be planned in two phases (is partially aggregatable). This PR works around this issue by triggering the
RewriteDistinctAggregatesin such cases (this is similar to the approach taken in 1.6).How was this patch tested?
Created
RewriteDistinctAggregatesSuitewhich checks if the aggregates with distinct aggregate functions get rewritten into twoAggregatesand anExpand. Added a regression test toDataFrameAggregateSuite.