-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced #33574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| aggsBuffer.exists(a => a.find(_ eq e).isDefined) | ||
| } | ||
| replaceGroupingFunc(_, groupByExprs, gid).transformDown { | ||
| replaceGroupingFunc(agg, groupByExprs, gid).transformDown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between underscore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the underscore, the aggsBuffer is outside the scope of the map function at runtime and it will save the results of all elements.
using normal parameters, the aggsBuffer will only be recreated each time inside the map function loop.
I suspect that Scala syntactic sugar in the conversion of the code made changes to cause, I also debugged this code many times before I found this difference, here is a simplified code to test separately.
def testMap(seq: Seq[Int]): Seq[Int] = {
seq.map {
val buf = ArrayBuffer[Int]()
_ match {
case e: Int if e < 1 =>
val r = e + 1
println(s"add to buf: $r")
buf += r
r
case e: Int if buf.contains(e) =>
println("already in buf")
0
case e =>
println("not in buf")
e
}
}
}
testMap(Seq(0, 1))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation.
|
@cfmcgrady @cloud-fan @dongjoon-hyun @gatorsmile Could you take a look ? Also, I've found the PR #14083 that changes the behavior of the map function, but I don't know why, Do you have any suggestions? Anyway, this PR fix this bug and is important to me, Could you review and verify it? |
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
cloud-fan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
|
ok to test |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142128 has finished for PR 33574 at commit
|
|
thanks, merging to master/3.2! |
…n should be replaced
### What changes were proposed in this pull request?
Currently, references to grouping sets are reported as errors after aggregated expressions, e.g.
```
SELECT count(name) c, name
FROM VALUES ('Alice'), ('Bob') people(name)
GROUP BY name GROUPING SETS(name);
```
Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;
### Why are the changes needed?
Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests.
Closes #33574 from gaoyajun02/SPARK-36339.
Lead-authored-by: gaoyajun02 <[email protected]>
Co-authored-by: gaoyajun02 <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 888f8f0)
Signed-off-by: Wenchen Fan <[email protected]>
|
@gaoyajun02 can you help to open backport PRs for 3.0/3.1? thanks! |
…n should be replaced
Currently, references to grouping sets are reported as errors after aggregated expressions, e.g.
```
SELECT count(name) c, name
FROM VALUES ('Alice'), ('Bob') people(name)
GROUP BY name GROUPING SETS(name);
```
Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;
Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid
No
Unit tests.
Closes apache#33574 from gaoyajun02/SPARK-36339.
Lead-authored-by: gaoyajun02 <[email protected]>
Co-authored-by: gaoyajun02 <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 888f8f0)
…n should be replaced
### What changes were proposed in this pull request?
Currently, references to grouping sets are reported as errors after aggregated expressions, e.g.
```
SELECT count(name) c, name
FROM VALUES ('Alice'), ('Bob') people(name)
GROUP BY name GROUPING SETS(name);
```
Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;
### Why are the changes needed?
Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests.
Closes apache#33574 from gaoyajun02/SPARK-36339.
Lead-authored-by: gaoyajun02 <[email protected]>
Co-authored-by: gaoyajun02 <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 888f8f0)
|
done. my pleasure @cloud-fan |
What changes were proposed in this pull request?
Currently, references to grouping sets are reported as errors after aggregated expressions, e.g.
Error in query: expression 'people.
name' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;Why are the changes needed?
Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests.