Skip to content

Conversation

@DonnyZone
Copy link
Contributor

What changes were proposed in this pull request?

Recently, we have also encountered such NPE issues in our production environment as introduced in:
https://issues.apache.org/jira/browse/SPARK-19471

This issue can be reproduced by the following examples:

` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4))).toDF("x", "y")

//HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),sum("y")).show()

//ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()

//SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false &&SQLConf.USE_OBJECT_HASH_AGG.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()`

This PR is based on PR-16820(#16820) with test cases for all aggregation paths.

When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions.

How was this patch tested?

unit test

@DonnyZone
Copy link
Contributor Author

There are some confilicts, close it first

@DonnyZone DonnyZone closed this Aug 11, 2017
@DonnyZone DonnyZone deleted the Branch-SPARK-19471 branch August 11, 2017 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant