Skip to content

Commit 171bf65

Browse files
koertkuiperscloud-fan
authored andcommitted
[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin optimization that can lead to NPE
Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown ## What changes were proposed in this pull request? Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed. ## How was this patch tested? Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Koert Kuipers <[email protected]> Closes #17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join. (cherry picked from commit 608bf30) Signed-off-by: Wenchen Fan <[email protected]>
1 parent a4c1ebc commit 171bf65

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,8 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper {
121121
val leftConditions = conditions.filter(_.references.subsetOf(join.left.outputSet))
122122
val rightConditions = conditions.filter(_.references.subsetOf(join.right.outputSet))
123123

124-
val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull)
125-
val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull)
124+
lazy val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull)
125+
lazy val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull)
126126

127127
join.joinType match {
128128
case RightOuter if leftHasNonNullPredicate => Inner

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1755,4 +1755,14 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
17551755
"Cannot have map type columns in DataFrame which calls set operations"))
17561756
}
17571757
}
1758+
1759+
test("SPARK-20359: catalyst outer join optimization should not throw npe") {
1760+
val df1 = Seq("a", "b", "c").toDF("x")
1761+
.withColumn("y", udf{ (x: String) => x.substring(0, 1) + "!" }.apply($"x"))
1762+
val df2 = Seq("a", "b").toDF("x1")
1763+
df1
1764+
.join(df2, df1("x") === df2("x1"), "left_outer")
1765+
.filter($"x1".isNotNull || !$"y".isin("a!"))
1766+
.count
1767+
}
17581768
}

0 commit comments

Comments
 (0)