Skip to content

Commit 25a4c5f

Browse files
ulysses-youcloud-fan
authored andcommitted
[SPARK-38185][SQL] Fix data incorrect if aggregate function is empty
### What changes were proposed in this pull request? Add `aggregateExpressions.nonEmpty` check in `groupOnly` function. ### Why are the changes needed? The group only condition should check if the aggregate expression is empty. In DataFrame api, it is allowed to make a empty aggregations. So the following query should return 1 rather than 0 because it's a global aggregate. ```scala val emptyAgg = Map.empty[String, String] spark.range(2).where("id > 2").agg(emptyAgg).limit(1).count ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Add test Closes #35490 from ulysses-you/SPARK-38185. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent d4a2e5c commit 25a4c5f

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -984,7 +984,9 @@ case class Aggregate(
984984

985985
// Whether this Aggregate operator is group only. For example: SELECT a, a FROM t GROUP BY a
986986
private[sql] def groupOnly: Boolean = {
987-
aggregateExpressions.map {
987+
// aggregateExpressions can be empty through Dateset.agg,
988+
// so we should also check groupingExpressions is non empty
989+
groupingExpressions.nonEmpty && aggregateExpressions.map {
988990
case Alias(child, _) => child
989991
case e => e
990992
}.forall(a => groupingExpressions.exists(g => a.semanticEquals(g)))

sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1443,6 +1443,11 @@ class DataFrameAggregateSuite extends QueryTest
14431443
val res = df.select($"d".cast("decimal(12, 2)").as("d")).agg(avg($"d").cast("string"))
14441444
checkAnswer(res, Row("9999999999.990000"))
14451445
}
1446+
1447+
test("SPARK-38185: Fix data incorrect if aggregate function is empty") {
1448+
val emptyAgg = Map.empty[String, String]
1449+
assert(spark.range(2).where("id > 2").agg(emptyAgg).limit(1).count == 1)
1450+
}
14461451
}
14471452

14481453
case class B(c: Option[Double])

0 commit comments

Comments
 (0)