Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Apr 12, 2016

What changes were proposed in this pull request?

The wide schema, the expression of fields will be splitted into multiple functions, but the variable for loopVar can't be accessed in splitted functions, this PR change them as class member.

How was this patch tested?

Added regression test.

@davies
Copy link
Contributor Author

davies commented Apr 12, 2016

cc @marmbrus @cloud-fan

@SparkQA
Copy link

SparkQA commented Apr 12, 2016

Test build #55651 has finished for PR 12338 at commit 4a472de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM

@cloud-fan
Copy link
Contributor

One more question, actually there are some more places that we use local variables as the input for expressions, e.g. CodeGenContext.currentVars in whole stage codegen, LambdaVariables in MapElements and typed filter, should we make them all class members, or is there a more general way to do it?

@davies
Copy link
Contributor Author

davies commented Apr 13, 2016

This is used for non-whole-stage codegen, also MapObjects does not support interpret mode, or it's better to fallback. Class member may prevent JIT compiler to do more optimization, so we should not aggresively put all of them as class members.

Merging this into master.

@asfgit asfgit closed this in 372baf0 Apr 13, 2016
asfgit pushed a commit that referenced this pull request Apr 19, 2016
… type

## What changes were proposed in this pull request?

After #12067, we now use expressions to do the aggregation in `TypedAggregateExpression`. To implement buffer merge, we produce a new buffer deserializer expression by replacing `AttributeReference` with right-side buffer attribute, like other `DeclarativeAggregate`s do, and finally combine the left and right buffer deserializer with `Invoke`.

However, after #12338, we will add loop variable to class members when codegen `MapObjects`. If the `Aggregator` buffer type is `Seq`, which is implemented by `MapObjects` expression, we will add the same loop variable to class members twice(by left and right buffer deserializer), which cause the `ClassFormatError`.

This PR fixes this issue by calling `distinct` before declare the class menbers.

## How was this patch tested?

new regression test in `DatasetAggregatorSuite`

Author: Wenchen Fan <[email protected]>

Closes #12468 from cloud-fan/bug.
kapilsingh5050 pushed a commit to kapilsingh5050/spark that referenced this pull request Dec 12, 2016
…with nested wide schema

The wide schema, the expression of fields will be splitted into multiple functions, but the variable for loopVar can't be accessed in splitted functions, this PR change them as class member.

Added regression test.

Author: Davies Liu <[email protected]>

Closes apache#12338 from davies/nested_row.

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants