-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin optimization that can lead to NPE #17660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…firm NPE is no longer thrown
|
Test build #75863 has finished for PR 17660 at commit
|
| if (boundE.find(_.isInstanceOf[Unevaluable]).isDefined) return false | ||
| val v = boundE.eval(emptyRow) | ||
| val v = try { | ||
| boundE.eval(emptyRow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check whether there exists the other similar cases in the code base that could trigger NullPointerException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sure i can do a scan for similar problems
|
I think the root problem is, in |
|
In general, we can't decide whether a UDF is null-propagate or not, so we can't catch NPE for |
|
I see. let me check if making leftHasNonNullPredicate and
rightHasNonNullPredicate lazy solves it then
…On Apr 17, 2017 23:44, "Wenchen Fan" ***@***.***> wrote:
I think the root problem is, in EliminateOuterJoin.buildNewJoinType, we
always build leftHasNonNullPredicate and rightHasNonNullPredicate. If
it's left join, only rightHasNonNullPredicate is used, and when building
leftHasNonNullPredicate, we may pass null values to a UDF that is not
supposed to run on null values.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#17660 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAyIJHOCvy6X0JVDoID_H3cDGYMZKGLUks5rxDG4gaJpZM4M_gRb>
.
|
|
@cloud-fan switching to lazy vals to avoid these predicates being evaluated when they are not used seems to work. |
|
Test build #75902 has finished for PR 17660 at commit
|
| val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) | ||
| val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) | ||
| lazy val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull) | ||
| lazy val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of lazy val, we can inline the xxx.exists in the case statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: They are used three times in case statements. Inline version looks too verbose.
|
As we don't know the details of UDF, shall we skip UDF for |
…optimization that can lead to NPE Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown ## What changes were proposed in this pull request? Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed. ## How was this patch tested? Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Koert Kuipers <[email protected]> Closes #17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join. (cherry picked from commit 608bf30) Signed-off-by: Wenchen Fan <[email protected]>
…optimization that can lead to NPE Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown ## What changes were proposed in this pull request? Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed. ## How was this patch tested? Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Koert Kuipers <[email protected]> Closes #17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join. (cherry picked from commit 608bf30) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/2.2/2.1! |
Then we may miss some opportunities for optimization. |
…optimization that can lead to NPE Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown ## What changes were proposed in this pull request? Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed. ## How was this patch tested? Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Koert Kuipers <[email protected]> Closes apache#17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join.
Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown
What changes were proposed in this pull request?
Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed.
How was this patch tested?
Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this.
Please review http://spark.apache.org/contributing.html before opening a pull request.