[SPARK-36028][SQL] Allow Project to host outer references in scalar subqueries #33235

allisonwang-db · 2021-07-06T21:24:30Z

What changes were proposed in this pull request?

This PR allows the Project node to host outer references in scalar subqueries when decorrelateInnerQuery is enabled. It is already supported by the new decorrelation framework and the RewriteCorrelatedScalarSubquery rule.

Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider SELECT (SELECT c1) FROM t. This should be optimized as SELECT c1 FROM t instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation.

Why are the changes needed?

To allow more types of correlated scalar subqueries.

Does this PR introduce any user-facing change?

Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example:

SELECT (SELECT c1) FROM t;

Before this change:

org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported 
outside of WHERE/HAVING clauses

After this change:

+------------------+
|scalarsubquery(c1)|
+------------------+
|0                 |
|1                 |
+------------------+

How was this patch tested?

Added unit tests and SQL tests.

SparkQA · 2021-07-06T22:42:38Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45228/

SparkQA · 2021-07-07T02:21:22Z

Test build #140717 has finished for PR 33235 at commit 9577d90.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

allisonwang-db · 2021-07-07T03:45:22Z

cc @cloud-fan

cloud-fan · 2021-07-07T04:25:52Z

thanks, merging to master!

…ubqueries This PR allows the `Project` node to host outer references in scalar subqueries when `decorrelateInnerQuery` is enabled. It is already supported by the new decorrelation framework and the `RewriteCorrelatedScalarSubquery` rule. Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider `SELECT (SELECT c1) FROM t`. This should be optimized as `SELECT c1 FROM t` instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation. To allow more types of correlated scalar subqueries. Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example: ```sql SELECT (SELECT c1) FROM t; ``` Before this change: ``` org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses ``` After this change: ``` +------------------+ |scalarsubquery(c1)| +------------------+ |0 | |1 | +------------------+ ``` Added unit tests and SQL tests. Closes apache#33235 from allisonwang-db/spark-36028-outer-in-project. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ca348e5) Signed-off-by: allisonwang-db <[email protected]>

…lar subqueries This PR cherry picks #33235 to branch-3.2 to fix test failures introduced by #33284. ### What changes were proposed in this pull request? This PR allows the `Project` node to host outer references in scalar subqueries when `decorrelateInnerQuery` is enabled. It is already supported by the new decorrelation framework and the `RewriteCorrelatedScalarSubquery` rule. Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider `SELECT (SELECT c1) FROM t`. This should be optimized as `SELECT c1 FROM t` instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation. ### Why are the changes needed? To allow more types of correlated scalar subqueries. ### Does this PR introduce _any_ user-facing change? Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example: ```sql SELECT (SELECT c1) FROM t; ``` Before this change: ``` org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses ``` After this change: ``` +------------------+ |scalarsubquery(c1)| +------------------+ |0 | |1 | +------------------+ ``` ### How was this patch tested? Added unit tests and SQL tests. (cherry picked from commit ca348e5) Signed-off-by: allisonwang-db <allison.wangdatabricks.com> Closes #33527 from allisonwang-db/spark-36028-3.2. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

outer-in-project

9577d90

github-actions bot added the SQL label Jul 6, 2021

cloud-fan closed this in ca348e5 Jul 7, 2021

allisonwang-db mentioned this pull request Jul 26, 2021

[SPARK-36028][SQL][3.2] Allow Project to host outer references in scalar subqueries #33527

Closed

allisonwang-db deleted the spark-36028-outer-in-project branch January 19, 2024 01:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-36028][SQL] Allow Project to host outer references in scalar subqueries #33235

[SPARK-36028][SQL] Allow Project to host outer references in scalar subqueries #33235

Uh oh!

allisonwang-db commented Jul 6, 2021

Uh oh!

SparkQA commented Jul 6, 2021

Uh oh!

SparkQA commented Jul 7, 2021

Uh oh!

allisonwang-db commented Jul 7, 2021

Uh oh!

cloud-fan commented Jul 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-36028][SQL] Allow Project to host outer references in scalar subqueries #33235

[SPARK-36028][SQL] Allow Project to host outer references in scalar subqueries #33235

Uh oh!

Conversation

allisonwang-db commented Jul 6, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Jul 6, 2021

Uh oh!

SparkQA commented Jul 7, 2021

Uh oh!

allisonwang-db commented Jul 7, 2021

Uh oh!

cloud-fan commented Jul 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants