Skip to content

Commit 608bf30

Browse files
koertkuiperscloud-fan
authored andcommitted
[SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin optimization that can lead to NPE
Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown ## What changes were proposed in this pull request? Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed. ## How was this patch tested? Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Koert Kuipers <[email protected]> Closes #17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join.
1 parent 702d85a commit 608bf30

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,8 +134,8 @@ case class EliminateOuterJoin(conf: SQLConf) extends Rule[LogicalPlan] with Pred
134134
val leftConditions = conditions.filter(_.references.subsetOf(join.left.outputSet))
135135
val rightConditions = conditions.filter(_.references.subsetOf(join.right.outputSet))
136136

137-
val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull)
138-
val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull)
137+
lazy val leftHasNonNullPredicate = leftConditions.exists(canFilterOutNull)
138+
lazy val rightHasNonNullPredicate = rightConditions.exists(canFilterOutNull)
139139

140140
join.joinType match {
141141
case RightOuter if leftHasNonNullPredicate => Inner

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1722,4 +1722,14 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
17221722
"Cannot have map type columns in DataFrame which calls set operations"))
17231723
}
17241724
}
1725+
1726+
test("SPARK-20359: catalyst outer join optimization should not throw npe") {
1727+
val df1 = Seq("a", "b", "c").toDF("x")
1728+
.withColumn("y", udf{ (x: String) => x.substring(0, 1) + "!" }.apply($"x"))
1729+
val df2 = Seq("a", "b").toDF("x1")
1730+
df1
1731+
.join(df2, df1("x") === df2("x1"), "left_outer")
1732+
.filter($"x1".isNotNull || !$"y".isin("a!"))
1733+
.count
1734+
}
17251735
}

0 commit comments

Comments
 (0)