Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jan 23, 2022

What changes were proposed in this pull request?

Use Aggregate.aggregateExpressions instead of Aggregate.output when pushing down limit 1 through Aggregate.

For example:

spark.range(10).selectExpr("id % 5 AS a", "id % 5 AS b").write.saveAsTable("t1")
spark.sql("SELECT a, b, a AS alias FROM t1 GROUP BY a, b LIMIT 1").explain(true)

Before this pr:

== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- !Project [a#227L, b#228L, alias#226L]
      +- LocalLimit 1
         +- Relation default.t1[a#227L,b#228L] parquet

After this pr:

== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- Project [a#227L, b#228L, a#227L AS alias#226L]
      +- LocalLimit 1
         +- Relation default.t1[a#227L,b#228L] parquet

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Jan 23, 2022
@wangyum
Copy link
Member Author

wangyum commented Jan 24, 2022

cc @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 9b12571 Jan 25, 2022
@wangyum wangyum deleted the SPARK-36183-2 branch January 25, 2022 02:04
wangyum added a commit that referenced this pull request May 26, 2023
…861)

* [SPARK-36183][SQL][FOLLOWUP] Fix push down limit 1 through Aggregate

### What changes were proposed in this pull request?

Use `Aggregate.aggregateExpressions` instead of `Aggregate.output` when  pushing down limit 1 through Aggregate.

For example:

```scala
spark.range(10).selectExpr("id % 5 AS a", "id % 5 AS b").write.saveAsTable("t1")
spark.sql("SELECT a, b, a AS alias FROM t1 GROUP BY a, b LIMIT 1").explain(true)
```
Before this pr:
```
== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- !Project [a#227L, b#228L, alias#226L]
      +- LocalLimit 1
         +- Relation default.t1[a#227L,b#228L] parquet
```
After this pr:
```
== Optimized Logical Plan ==
GlobalLimit 1
+- LocalLimit 1
   +- Project [a#227L, b#228L, a#227L AS alias#226L]
      +- LocalLimit 1
         +- Relation default.t1[a#227L,b#228L] parquet
```

### Why are the changes needed?

Fix bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #35286 from wangyum/SPARK-36183-2.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>

(cherry picked from commit 9b12571)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants