Skip to content

Conversation

@yangw1234
Copy link

@yangw1234 yangw1234 commented Feb 6, 2017

What changes were proposed in this pull request?

When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions.

This problem was introduced by #15567.

How was this patch tested?

unit test

@yangw1234
Copy link
Author

@mengxr @rxin

@hvanhovell
Copy link
Contributor

ok to test

@hvanhovell
Copy link
Contributor

@yangw1234 could you also check if we need to do this for whole stage code generation?

...and you really need to add tests.

@SparkQA
Copy link

SparkQA commented Feb 6, 2017

Test build #72445 has finished for PR 16820 at commit 7ec5ebf.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yangw1234
Copy link
Author

@hvanhovell thanks for your review. Whole stage code generation seems fine and unit test is added.

@SparkQA
Copy link

SparkQA commented Feb 6, 2017

Test build #72450 has finished for PR 16820 at commit 97b07a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 6, 2017

Test build #72451 has finished for PR 16820 at commit b9b9693.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private def assertNoExceptions(c: Column): Unit = {
for (wholeStage <- Seq(true, false)) {
withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, wholeStage.toString)) {
spark.range(0, 5).toDF("a").agg(sum("a")).withColumn("v", c).collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test also passes without your test. I think you need to reference a NonDeterministic expression in the aggregate.

Could also make sure that we test all aggregation paths:

  1. HashAggregate
  2. ObjectHashAggregate
  3. SortAggregate

@gatorsmile
Copy link
Member

@yangw1234 Could you address the comment by @hvanhovell ? Thanks!

@yangw1234
Copy link
Author

@gatorsmile Sorry, I totally forget this pr. I will try to address the comment this week (need a little time to re-familiarize the context).

@yangw1234
Copy link
Author

Sorry I could not find time to finish this pr recently. Close it for now. If you need this fix, please feel free to base on it and finish it.

@yangw1234 yangw1234 closed this Jun 6, 2017
ghost pushed a commit to dbtsai/spark that referenced this pull request Aug 14, 2017
…ted result projection before using it

## What changes were proposed in this pull request?

Recently, we have also encountered such NPE issues in our production environment as described in:
https://issues.apache.org/jira/browse/SPARK-19471

This issue can be reproduced by the following examples:
` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4))).toDF("x", "y")

//HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),sum("y")).show()

//ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()

//SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false &&SQLConf.USE_OBJECT_HASH_AGG.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()`
`

This PR is based on PR-16820(apache#16820) with test cases for all aggregation paths. We want to push it forward.

> When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions.

## How was this patch tested?

unit test
verified in production environment

Author: donnyzone <[email protected]>

Closes apache#18920 from DonnyZone/Branch-spark-19471.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants