Skip to content

Conversation

@gaoyajun02
Copy link
Contributor

What changes were proposed in this pull request?

Currently, references to grouping sets are reported as errors after aggregated expressions, e.g.

SELECT count(name) c, name
FROM VALUES ('Alice'), ('Bob') people(name)
GROUP BY name GROUPING SETS(name);

Error in query: expression 'people.name' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;

Why are the changes needed?

Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests.

@github-actions github-actions bot added the SQL label Jul 29, 2021
aggsBuffer.exists(a => a.find(_ eq e).isDefined)
}
replaceGroupingFunc(_, groupByExprs, gid).transformDown {
replaceGroupingFunc(agg, groupByExprs, gid).transformDown {
Copy link
Contributor

@cfmcgrady cfmcgrady Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between underscore?

Copy link
Contributor Author

@gaoyajun02 gaoyajun02 Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the underscore, the aggsBuffer is outside the scope of the map function at runtime and it will save the results of all elements.
using normal parameters, the aggsBuffer will only be recreated each time inside the map function loop.

I suspect that Scala syntactic sugar in the conversion of the code made changes to cause, I also debugged this code many times before I found this difference, here is a simplified code to test separately.

    def testMap(seq: Seq[Int]): Seq[Int] = {
      seq.map {
        val buf = ArrayBuffer[Int]()
        _ match {
          case e: Int if e < 1 =>
            val r = e + 1
            println(s"add to buf: $r")
            buf += r
            r
          case e: Int if buf.contains(e) =>
            println("already in buf")
            0
          case e =>
            println("not in buf")
            e
        }
      }
    }

    testMap(Seq(0, 1))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation.

@gaoyajun02
Copy link
Contributor Author

gaoyajun02 commented Aug 4, 2021

@cfmcgrady @cloud-fan @dongjoon-hyun @gatorsmile Could you take a look ?

Also, I've found the PR #14083 that changes the behavior of the map function, but I don't know why, Do you have any suggestions?

Anyway, this PR fix this bug and is important to me, Could you review and verify it?

@gaoyajun02 gaoyajun02 changed the title [SPARK-36339] References to grouping that not part of aggregation should be replaced [SPARK-36339][SQL] References to grouping that not part of aggregation should be replaced Aug 4, 2021
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46640/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46640/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Test build #142128 has finished for PR 33574 at commit 321374d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2!

@cloud-fan cloud-fan closed this in 888f8f0 Aug 6, 2021
cloud-fan pushed a commit that referenced this pull request Aug 6, 2021
…n should be replaced

### What changes were proposed in this pull request?

Currently, references to grouping sets are reported as errors after aggregated expressions, e.g.
```
SELECT count(name) c, name
FROM VALUES ('Alice'), ('Bob') people(name)
GROUP BY name GROUPING SETS(name);
```
Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;

### Why are the changes needed?

Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests.

Closes #33574 from gaoyajun02/SPARK-36339.

Lead-authored-by: gaoyajun02 <[email protected]>
Co-authored-by: gaoyajun02 <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 888f8f0)
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan
Copy link
Contributor

@gaoyajun02 can you help to open backport PRs for 3.0/3.1? thanks!

gaoyajun02 added a commit to gaoyajun02/spark that referenced this pull request Aug 6, 2021
…n should be replaced

Currently, references to grouping sets are reported as errors after aggregated expressions, e.g.
```
SELECT count(name) c, name
FROM VALUES ('Alice'), ('Bob') people(name)
GROUP BY name GROUPING SETS(name);
```
Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;

Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid

No

Unit tests.

Closes apache#33574 from gaoyajun02/SPARK-36339.

Lead-authored-by: gaoyajun02 <[email protected]>
Co-authored-by: gaoyajun02 <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 888f8f0)
gaoyajun02 added a commit to gaoyajun02/spark that referenced this pull request Aug 6, 2021
…n should be replaced

### What changes were proposed in this pull request?

Currently, references to grouping sets are reported as errors after aggregated expressions, e.g.
```
SELECT count(name) c, name
FROM VALUES ('Alice'), ('Bob') people(name)
GROUP BY name GROUPING SETS(name);
```
Error in query: expression 'people.`name`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;

### Why are the changes needed?

Fix the map anonymous function in the constructAggregateExprs function does not use underscores to avoid

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests.

Closes apache#33574 from gaoyajun02/SPARK-36339.

Lead-authored-by: gaoyajun02 <[email protected]>
Co-authored-by: gaoyajun02 <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 888f8f0)
@gaoyajun02
Copy link
Contributor Author

done. my pleasure @cloud-fan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants