[SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic #9015

JoshRosen · 2015-10-07T23:48:07Z

In aggregate/utils.scala, there is a substantial amount of duplication in the expression-rewriting logic. As a prerequisite to supporting imperative aggregate functions in TungstenAggregate, this patch refactors this file so that the same expression-rewriting logic is used for both SortAggregate and TungstenAggregate.

In order to allow both operators to use the same rewriting logic, TungstenAggregationIterator. generateResultProjection() has been updated so that it first evaluates all declarative aggregate functions' evaluateExpressions and writes the results into a temporary buffer, and then uses this temporary buffer and the grouping expressions to evaluate the final resultExpressions. This matches the logic in SortAggregateIterator, where this two-pass approach is necessary in order to support imperative aggregates. If this change turns out to cause performance regressions, then we can look into re-implementing the single-pass evaluation in a cleaner way as part of a followup patch.

Since the rewriting logic is now shared across both operators, this patch also extracts that logic and places it in SparkStrategies. This makes the rewriting logic a bit easier to follow, I think.

…Aggregate.

JoshRosen · 2015-10-07T23:51:30Z

/cc @yhuai for review.

SparkQA · 2015-10-08T00:21:18Z

Test build #43358 has finished for PR 9015 at commit fb0daa1.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Average(child: Expression) extends DeclarativeAggregate
- case class Count(child: Expression) extends DeclarativeAggregate
- case class First(child: Expression) extends DeclarativeAggregate
- case class Last(child: Expression) extends DeclarativeAggregate
- case class Max(child: Expression) extends DeclarativeAggregate
- case class Min(child: Expression) extends DeclarativeAggregate
- abstract class StddevAgg(child: Expression) extends DeclarativeAggregate
- case class Sum(child: Expression) extends DeclarativeAggregate

JoshRosen · 2015-10-08T00:22:05Z

Jenkins, retest this please.

SparkQA · 2015-10-08T02:27:59Z

Test build #43363 has finished for PR 9015 at commit fb0daa1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-10-08T02:38:31Z

Test build #43364 has finished for PR 9015 at commit fb0daa1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-10-08T21:19:15Z

...re/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala

Will we uncomment it? Or, we will use NoOp?

yhuai · 2015-10-08T21:52:55Z

LGTM. Merging to master.

JoshRosen added 7 commits October 7, 2015 15:46

Unify result expression rewriting for both SortAggregate and Tungsten…

ee84e94

…Aggregate.

Simplify type of aggregateFunctionMap.

af035d2

Rename aggreagteFunctionMap to aggregateFunctionToAttribute

d5b7318

Expand the comment on only evaluating each agg. func. once.

a888351

transform -> transformDown

4bac641

Remove now-unused rewrittenAggregateFunctions.

7cb82b5

Extract common rewriting logic.

fb0daa1

yhuai reviewed Oct 8, 2015
View reviewed changes

asfgit closed this in 2816c89 Oct 8, 2015

JoshRosen deleted the SPARK-10988 branch August 29, 2016 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic #9015

[SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic #9015

Uh oh!

JoshRosen commented Oct 7, 2015

Uh oh!

JoshRosen commented Oct 7, 2015

Uh oh!

SparkQA commented Oct 8, 2015

Uh oh!

JoshRosen commented Oct 8, 2015

Uh oh!

SparkQA commented Oct 8, 2015

Uh oh!

SparkQA commented Oct 8, 2015

Uh oh!

yhuai Oct 8, 2015

Uh oh!

yhuai commented Oct 8, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic #9015

[SPARK-10988] [SQL] Reduce duplication in Aggregate2's expression rewriting logic #9015

Uh oh!

Conversation

JoshRosen commented Oct 7, 2015

Uh oh!

JoshRosen commented Oct 7, 2015

Uh oh!

SparkQA commented Oct 8, 2015

Uh oh!

JoshRosen commented Oct 8, 2015

Uh oh!

SparkQA commented Oct 8, 2015

Uh oh!

SparkQA commented Oct 8, 2015

Uh oh!

yhuai Oct 8, 2015

Choose a reason for hiding this comment

Uh oh!

yhuai commented Oct 8, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants