Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

After #12067, we now use expressions to do the aggregation in TypedAggregateExpression. To implement buffer merge, we produce a new buffer deserializer expression by replacing AttributeReference with right-side buffer attribute, like other DeclarativeAggregates do, and finally combine the left and right buffer deserializer with Invoke.

However, after #12338, we will add loop variable to class members when codegen MapObjects. If the Aggregator buffer type is Seq, which is implemented by MapObjects expression, we will add the same loop variable to class members twice(by left and right buffer deserializer), which cause the ClassFormatError.

This PR fixes this issue by calling distinct before declare the class menbers.

How was this patch tested?

new regression test in DatasetAggregatorSuite

@cloud-fan
Copy link
Contributor Author

cc @davies @rxin @yhuai @liancheng

// 1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
// 2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
val shouldRecurse = root match {
case _: CodegenFallback => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few expressions implement CodegenFallback but only use it in some corner cases

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56067 has finished for PR 12468 at commit 3426310.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #56163 has finished for PR 12468 at commit a57f8e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def declareMutableStates(): String = {
mutableStates.map { case (javaType, variableName, _) =>
// It's possible that we add same mutable state twice, e.g. the `mergeExpressions` in
// `TypedAggregateExpression`, we should call `distinct` here to remove the duplicated ones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid of adding the same mutable state twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, we can transform the left deserializer in TypedAggregateExpression and create new LambdaVariable with different names. But I'm afraid there will be more similar problems so I go with this approach.

@yhuai
Copy link
Contributor

yhuai commented Apr 19, 2016

LGTM. Merging to master.

@asfgit asfgit closed this in 5cb2e33 Apr 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants