[SPARK-19471][SQL]AggregationIterator does not initialize the generated result projection before using it #18919

DonnyZone · 2017-08-11T11:29:43Z

What changes were proposed in this pull request?

Recently, we have also encountered such NPE issues in our production environment as introduced in:
https://issues.apache.org/jira/browse/SPARK-19471

This issue can be reproduced by the following examples:

` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4))).toDF("x", "y")

//HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),sum("y")).show()

//ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()

//SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false &&SQLConf.USE_OBJECT_HASH_AGG.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()`

This PR is based on PR-16820(#16820) with test cases for all aggregation paths.

When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions.

How was this patch tested?

unit test

DonnyZone · 2017-08-11T11:46:09Z

There are some confilicts, close it first

DonnyZone added 2 commits August 11, 2017 18:03

SPARK-19471 with test cases

9a22b71

error fix

8f813cf

DonnyZone closed this Aug 11, 2017

DonnyZone deleted the Branch-SPARK-19471 branch August 11, 2017 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-19471][SQL]AggregationIterator does not initialize the generated result projection before using it #18919

[SPARK-19471][SQL]AggregationIterator does not initialize the generated result projection before using it #18919

Uh oh!

DonnyZone commented Aug 11, 2017

Uh oh!

DonnyZone commented Aug 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARK-19471][SQL]AggregationIterator does not initialize the generated result projection before using it #18919

[SPARK-19471][SQL]AggregationIterator does not initialize the generated result projection before using it #18919

Uh oh!

Conversation

DonnyZone commented Aug 11, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

DonnyZone commented Aug 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant