Skip to content

Conversation

@JoshRosen
Copy link
Contributor

In aggregate/utils.scala, there is a substantial amount of duplication in the expression-rewriting logic. As a prerequisite to supporting imperative aggregate functions in TungstenAggregate, this patch refactors this file so that the same expression-rewriting logic is used for both SortAggregate and TungstenAggregate.

In order to allow both operators to use the same rewriting logic, TungstenAggregationIterator. generateResultProjection() has been updated so that it first evaluates all declarative aggregate functions' evaluateExpressions and writes the results into a temporary buffer, and then uses this temporary buffer and the grouping expressions to evaluate the final resultExpressions. This matches the logic in SortAggregateIterator, where this two-pass approach is necessary in order to support imperative aggregates. If this change turns out to cause performance regressions, then we can look into re-implementing the single-pass evaluation in a cleaner way as part of a followup patch.

Since the rewriting logic is now shared across both operators, this patch also extracts that logic and places it in SparkStrategies. This makes the rewriting logic a bit easier to follow, I think.

@JoshRosen
Copy link
Contributor Author

/cc @yhuai for review.

@SparkQA
Copy link

SparkQA commented Oct 8, 2015

Test build #43358 has finished for PR 9015 at commit fb0daa1.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Average(child: Expression) extends DeclarativeAggregate
    • case class Count(child: Expression) extends DeclarativeAggregate
    • case class First(child: Expression) extends DeclarativeAggregate
    • case class Last(child: Expression) extends DeclarativeAggregate
    • case class Max(child: Expression) extends DeclarativeAggregate
    • case class Min(child: Expression) extends DeclarativeAggregate
    • abstract class StddevAgg(child: Expression) extends DeclarativeAggregate
    • case class Sum(child: Expression) extends DeclarativeAggregate

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 8, 2015

Test build #43363 has finished for PR 9015 at commit fb0daa1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 8, 2015

Test build #43364 has finished for PR 9015 at commit fb0daa1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we uncomment it? Or, we will use NoOp?

@yhuai
Copy link
Contributor

yhuai commented Oct 8, 2015

LGTM. Merging to master.

@asfgit asfgit closed this in 2816c89 Oct 8, 2015
@JoshRosen JoshRosen deleted the SPARK-10988 branch August 29, 2016 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants