Skip to content

Conversation

@ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

Add aggregateExpressions.nonEmpty check in groupOnly function.

Why are the changes needed?

The group only condition should check if the aggregate expression is empty.

In DataFrame api, it is allowed to make a empty aggregations.

So the following query should return 1 rather than 0 because it's a global aggregate.

val emptyAgg = Map.empty[String, String]
spark.range(2).where("id > 2").agg(emptyAgg).limit(1).count

Does this PR introduce any user-facing change?

yes, bug fix

How was this patch tested?

Add test

@ulysses-you
Copy link
Contributor Author

cc @wangyum @cloud-fan

@github-actions github-actions bot added the SQL label Feb 11, 2022
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@cloud-fan
Copy link
Contributor

cc @sigmod as well

private[sql] def groupOnly: Boolean = {
aggregateExpressions.map {
// aggregateExpressions can be empty through Dateset.agg
aggregateExpressions.nonEmpty && aggregateExpressions.map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it more explicit to check groupingExpressions.nonEmpty? Logically, an aggregate operator without output columns is still group only if it has group columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

@cloud-fan cloud-fan closed this in 25a4c5f Feb 11, 2022
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2!

cloud-fan pushed a commit that referenced this pull request Feb 11, 2022
Add `aggregateExpressions.nonEmpty` check in `groupOnly` function.

The group only condition should check if the aggregate expression is empty.

In DataFrame api, it is allowed to make a empty aggregations.

So the following query should return 1 rather than 0 because it's a global aggregate.
```scala
val emptyAgg = Map.empty[String, String]
spark.range(2).where("id > 2").agg(emptyAgg).limit(1).count
```

yes, bug fix

Add test

Closes #35490 from ulysses-you/SPARK-38185.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 25a4c5f)
Signed-off-by: Wenchen Fan <[email protected]>
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @ulysses-you and all.

@dongjoon-hyun
Copy link
Member

The code looks good but branch-3.2 seems not to have this issue.
The UT passes without the patch in branch-3.2. Do you know what difference causes the issue at master branch, @ulysses-you ?
BTW, I'm good to have this in branch-3.2 in any case because new code looks robust.

@ulysses-you
Copy link
Contributor Author

thank you @dongjoon-hyun , The rule that change aggregate if it's group only with limit 1 to project is landed at branch-3.3 see SPARK-36183, and it's the key to trigger this bug. But the group only condition is added since branch-3.2, see SPARK-34808.

So it would be good to also fix this in branch-3.2.

@ulysses-you ulysses-you deleted the SPARK-38185 branch February 12, 2022 01:26
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
Add `aggregateExpressions.nonEmpty` check in `groupOnly` function.

The group only condition should check if the aggregate expression is empty.

In DataFrame api, it is allowed to make a empty aggregations.

So the following query should return 1 rather than 0 because it's a global aggregate.
```scala
val emptyAgg = Map.empty[String, String]
spark.range(2).where("id > 2").agg(emptyAgg).limit(1).count
```

yes, bug fix

Add test

Closes apache#35490 from ulysses-you/SPARK-38185.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 25a4c5f)
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 3711a81)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants