Skip to content

Commit 3711a81

Browse files
ulysses-youcloud-fan
authored andcommitted
[SPARK-38185][SQL] Fix data incorrect if aggregate function is empty
Add `aggregateExpressions.nonEmpty` check in `groupOnly` function. The group only condition should check if the aggregate expression is empty. In DataFrame api, it is allowed to make a empty aggregations. So the following query should return 1 rather than 0 because it's a global aggregate. ```scala val emptyAgg = Map.empty[String, String] spark.range(2).where("id > 2").agg(emptyAgg).limit(1).count ``` yes, bug fix Add test Closes #35490 from ulysses-you/SPARK-38185. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 25a4c5f) Signed-off-by: Wenchen Fan <[email protected]>
1 parent cd87ea8 commit 3711a81

File tree

2 files changed

+11
-1
lines changed

2 files changed

+11
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -982,7 +982,12 @@ case class Aggregate(
982982

983983
// Whether this Aggregate operator is group only. For example: SELECT a, a FROM t GROUP BY a
984984
private[sql] def groupOnly: Boolean = {
985-
aggregateExpressions.forall(a => groupingExpressions.exists(g => a.semanticEquals(g)))
985+
// aggregateExpressions can be empty through Dateset.agg,
986+
// so we should also check groupingExpressions is non empty
987+
groupingExpressions.nonEmpty && aggregateExpressions.map {
988+
case Alias(child, _) => child
989+
case e => e
990+
}.forall(a => groupingExpressions.exists(g => a.semanticEquals(g)))
986991
}
987992
}
988993

sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1422,6 +1422,11 @@ class DataFrameAggregateSuite extends QueryTest
14221422
val res = df.select($"d".cast("decimal(12, 2)").as("d")).agg(avg($"d").cast("string"))
14231423
checkAnswer(res, Row("9999999999.990000"))
14241424
}
1425+
1426+
test("SPARK-38185: Fix data incorrect if aggregate function is empty") {
1427+
val emptyAgg = Map.empty[String, String]
1428+
assert(spark.range(2).where("id > 2").agg(emptyAgg).limit(1).count == 1)
1429+
}
14251430
}
14261431

14271432
case class B(c: Option[Double])

0 commit comments

Comments
 (0)