-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic #9015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/cc @yhuai for review. |
|
Test build #43358 has finished for PR 9015 at commit
|
|
Jenkins, retest this please. |
|
Test build #43363 has finished for PR 9015 at commit
|
|
Test build #43364 has finished for PR 9015 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we uncomment it? Or, we will use NoOp?
|
LGTM. Merging to master. |
In
aggregate/utils.scala, there is a substantial amount of duplication in the expression-rewriting logic. As a prerequisite to supporting imperative aggregate functions inTungstenAggregate, this patch refactors this file so that the same expression-rewriting logic is used for bothSortAggregateandTungstenAggregate.In order to allow both operators to use the same rewriting logic,
TungstenAggregationIterator. generateResultProjection()has been updated so that it first evaluates all declarative aggregate functions'evaluateExpressions and writes the results into a temporary buffer, and then uses this temporary buffer and the grouping expressions to evaluate the final resultExpressions. This matches the logic in SortAggregateIterator, where this two-pass approach is necessary in order to support imperative aggregates. If this change turns out to cause performance regressions, then we can look into re-implementing the single-pass evaluation in a cleaner way as part of a followup patch.Since the rewriting logic is now shared across both operators, this patch also extracts that logic and places it in
SparkStrategies. This makes the rewriting logic a bit easier to follow, I think.