-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14675][SQL] ClassFormatError when use Seq as Aggregator buffer type #12468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // 1. CodegenFallback: it's children will not be used to generate code (call eval() instead) | ||
| // 2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination. | ||
| val shouldRecurse = root match { | ||
| case _: CodegenFallback => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few expressions implement CodegenFallback but only use it in some corner cases
|
Test build #56067 has finished for PR 12468 at commit
|
|
Test build #56163 has finished for PR 12468 at commit
|
| def declareMutableStates(): String = { | ||
| mutableStates.map { case (javaType, variableName, _) => | ||
| // It's possible that we add same mutable state twice, e.g. the `mergeExpressions` in | ||
| // `TypedAggregateExpression`, we should call `distinct` here to remove the duplicated ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid of adding the same mutable state twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, we can transform the left deserializer in TypedAggregateExpression and create new LambdaVariable with different names. But I'm afraid there will be more similar problems so I go with this approach.
|
LGTM. Merging to master. |
What changes were proposed in this pull request?
After #12067, we now use expressions to do the aggregation in
TypedAggregateExpression. To implement buffer merge, we produce a new buffer deserializer expression by replacingAttributeReferencewith right-side buffer attribute, like otherDeclarativeAggregates do, and finally combine the left and right buffer deserializer withInvoke.However, after #12338, we will add loop variable to class members when codegen
MapObjects. If theAggregatorbuffer type isSeq, which is implemented byMapObjectsexpression, we will add the same loop variable to class members twice(by left and right buffer deserializer), which cause theClassFormatError.This PR fixes this issue by calling
distinctbefore declare the class menbers.How was this patch tested?
new regression test in
DatasetAggregatorSuite