Skip to content

Conversation

@koertkuipers
Copy link
Contributor

@koertkuipers koertkuipers commented Apr 17, 2017

Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown

What changes were proposed in this pull request?

Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed.

How was this patch tested?

Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this.

Please review http://spark.apache.org/contributing.html before opening a pull request.

@SparkQA
Copy link

SparkQA commented Apr 17, 2017

Test build #75863 has finished for PR 17660 at commit 5e2b828.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (boundE.find(_.isInstanceOf[Unevaluable]).isDefined) return false
val v = boundE.eval(emptyRow)
val v = try {
boundE.eval(emptyRow)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check whether there exists the other similar cases in the code base that could trigger NullPointerException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure i can do a scan for similar problems

@cloud-fan
Copy link
Contributor

I think the root problem is, in EliminateOuterJoin.buildNewJoinType, we always build leftHasNonNullPredicate and rightHasNonNullPredicate. If it's left join, only rightHasNonNullPredicate is used, and when building leftHasNonNullPredicate, we may pass null values to a UDF that is not supposed to run on null values.

@cloud-fan
Copy link
Contributor

In general, we can't decide whether a UDF is null-propagate or not, so we can't catch NPE for Expression.eval blindly. When we call Expression.eval in optimizer, we should make sure we do not pass unexpected input like EliminateOuterJoin did.

@koertkuipers
Copy link
Contributor Author

koertkuipers commented Apr 18, 2017 via email

@koertkuipers
Copy link
Contributor Author

@cloud-fan switching to lazy vals to avoid these predicates being evaluated when they are not used seems to work.
so i think this is a better (more targeted) solution for now, and i removed my try/catch logic.

@koertkuipers koertkuipers changed the title [SPARK-20359][SQL] catch NPE in EliminateOuterJoin optimization [SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin optimization that can lead to NPE Apr 18, 2017
@SparkQA
Copy link

SparkQA commented Apr 18, 2017

Test build #75902 has finished for PR 17660 at commit 975c850.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull)
val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull)
lazy val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull)
lazy val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of lazy val, we can inline the xxx.exists in the case statements.

Copy link
Member

@viirya viirya Apr 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: They are used three times in case statements. Inline version looks too verbose.

@viirya
Copy link
Member

viirya commented Apr 19, 2017

As we don't know the details of UDF, shall we skip UDF for canFilterOutNull and directly return false?

asfgit pushed a commit that referenced this pull request Apr 19, 2017
…optimization that can lead to NPE

Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown

## What changes were proposed in this pull request?
Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed.

## How was this patch tested?

Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Koert Kuipers <[email protected]>

Closes #17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join.

(cherry picked from commit 608bf30)
Signed-off-by: Wenchen Fan <[email protected]>
asfgit pushed a commit that referenced this pull request Apr 19, 2017
…optimization that can lead to NPE

Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown

## What changes were proposed in this pull request?
Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed.

## How was this patch tested?

Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Koert Kuipers <[email protected]>

Closes #17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join.

(cherry picked from commit 608bf30)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.2/2.1!

@asfgit asfgit closed this in 608bf30 Apr 19, 2017
@cloud-fan
Copy link
Contributor

cloud-fan commented Apr 19, 2017

As we don't know the details of UDF, shall we skip UDF for canFilterOutNull and directly return false?

Then we may miss some opportunities for optimization.

peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…optimization that can lead to NPE

Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown

## What changes were proposed in this pull request?
Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed.

## How was this patch tested?

Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Koert Kuipers <[email protected]>

Closes apache#17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants