Skip to content

Commit 593b423

Browse files
committed
[SPARK-31958][SQL] normalize special floating numbers in subquery
### What changes were proposed in this pull request? This is a followup of #23388 . #23388 has an issue: it doesn't handle subquery expressions and assumes they will be turned into joins. However, this is not true for non-correlated subquery expressions. This PR fixes this issue. It now doesn't skip `Subquery`, and subquery expressions will be handled by `OptimizeSubqueries`, which runs the optimizer with the subquery. Note that, correlated subquery expressions will be handled twice: once in `OptimizeSubqueries`, once later when it becomes join. This is OK as `NormalizeFloatingNumbers` is idempotent now. ### Why are the changes needed? fix a bug ### Does this PR introduce _any_ user-facing change? yes, see the newly added test. ### How was this patch tested? new test Closes #28785 from cloud-fan/normalize. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6fb9c80) Signed-off-by: Wenchen Fan <[email protected]>
1 parent d1a3fad commit 593b423

File tree

2 files changed

+18
-4
lines changed

2 files changed

+18
-4
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,6 @@ import org.apache.spark.sql.types._
5656
object NormalizeFloatingNumbers extends Rule[LogicalPlan] {
5757

5858
def apply(plan: LogicalPlan): LogicalPlan = plan match {
59-
// A subquery will be rewritten into join later, and will go through this rule
60-
// eventually. Here we skip subquery, as we only need to run this rule once.
61-
case _: Subquery => plan
62-
6359
case _ => plan transform {
6460
case w: Window if w.partitionSpec.exists(p => needNormalize(p)) =>
6561
// Although the `windowExpressions` may refer to `partitionSpec` expressions, we don't need

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3449,6 +3449,24 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
34493449
checkAnswer(sql("select CAST(-32768 as short) DIV CAST (-1 as short)"),
34503450
Seq(Row(Short.MinValue.toLong * -1)))
34513451
}
3452+
3453+
test("normalize special floating numbers in subquery") {
3454+
withTempView("v1", "v2", "v3") {
3455+
Seq(-0.0).toDF("d").createTempView("v1")
3456+
Seq(0.0).toDF("d").createTempView("v2")
3457+
spark.range(2).createTempView("v3")
3458+
3459+
// non-correlated subquery
3460+
checkAnswer(sql("SELECT (SELECT v1.d FROM v1 JOIN v2 ON v1.d = v2.d)"), Row(-0.0))
3461+
// correlated subquery
3462+
checkAnswer(
3463+
sql(
3464+
"""
3465+
|SELECT id FROM v3 WHERE EXISTS
3466+
| (SELECT v1.d FROM v1 JOIN v2 ON v1.d = v2.d WHERE id > 0)
3467+
|""".stripMargin), Row(1))
3468+
}
3469+
}
34523470
}
34533471

34543472
case class Foo(bar: Option[String])

0 commit comments

Comments
 (0)