[SPARK-36028][SQL][3.2] Allow Project to host outer references in scalar subqueries #33527

allisonwang-db · 2021-07-26T22:18:08Z

This PR cherry picks #33235 to branch-3.2 to fix test failures introduced by #33284.

What changes were proposed in this pull request?

This PR allows the Project node to host outer references in scalar subqueries when decorrelateInnerQuery is enabled. It is already supported by the new decorrelation framework and the RewriteCorrelatedScalarSubquery rule.

Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider SELECT (SELECT c1) FROM t. This should be optimized as SELECT c1 FROM t instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation.

Why are the changes needed?

To allow more types of correlated scalar subqueries.

Does this PR introduce any user-facing change?

Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example:

SELECT (SELECT c1) FROM t;

Before this change:

org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported
outside of WHERE/HAVING clauses

After this change:

+------------------+
|scalarsubquery(c1)|
+------------------+
|0                 |
|1                 |
+------------------+

How was this patch tested?

Added unit tests and SQL tests.

(cherry picked from commit ca348e5)
Signed-off-by: allisonwang-db [email protected]

SparkQA · 2021-07-26T23:15:52Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46180/

SparkQA · 2021-07-26T23:49:25Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46180/

SparkQA · 2021-07-27T02:43:34Z

Test build #141664 has finished for PR 33527 at commit fb58c1c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

allisonwang-db · 2021-07-27T03:01:44Z

cc @cloud-fan (The test failures seem to be related to SPARK-35985?)

…ubqueries This PR allows the `Project` node to host outer references in scalar subqueries when `decorrelateInnerQuery` is enabled. It is already supported by the new decorrelation framework and the `RewriteCorrelatedScalarSubquery` rule. Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider `SELECT (SELECT c1) FROM t`. This should be optimized as `SELECT c1 FROM t` instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation. To allow more types of correlated scalar subqueries. Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example: ```sql SELECT (SELECT c1) FROM t; ``` Before this change: ``` org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses ``` After this change: ``` +------------------+ |scalarsubquery(c1)| +------------------+ |0 | |1 | +------------------+ ``` Added unit tests and SQL tests. Closes apache#33235 from allisonwang-db/spark-36028-outer-in-project. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ca348e5) Signed-off-by: allisonwang-db <[email protected]>

SparkQA · 2021-07-27T19:55:21Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46240/

SparkQA · 2021-07-27T23:31:18Z

Test build #141727 has finished for PR 33527 at commit 29b068c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-07-28T04:54:13Z

thanks, merging to 3.2!

…lar subqueries This PR cherry picks #33235 to branch-3.2 to fix test failures introduced by #33284. ### What changes were proposed in this pull request? This PR allows the `Project` node to host outer references in scalar subqueries when `decorrelateInnerQuery` is enabled. It is already supported by the new decorrelation framework and the `RewriteCorrelatedScalarSubquery` rule. Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider `SELECT (SELECT c1) FROM t`. This should be optimized as `SELECT c1 FROM t` instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation. ### Why are the changes needed? To allow more types of correlated scalar subqueries. ### Does this PR introduce _any_ user-facing change? Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example: ```sql SELECT (SELECT c1) FROM t; ``` Before this change: ``` org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses ``` After this change: ``` +------------------+ |scalarsubquery(c1)| +------------------+ |0 | |1 | +------------------+ ``` ### How was this patch tested? Added unit tests and SQL tests. (cherry picked from commit ca348e5) Signed-off-by: allisonwang-db <allison.wangdatabricks.com> Closes #33527 from allisonwang-db/spark-36028-3.2. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

github-actions bot added the SQL label Jul 26, 2021

allisonwang-db force-pushed the spark-36028-3.2 branch from fb58c1c to 29b068c Compare July 27, 2021 18:11

cloud-fan closed this Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-36028][SQL][3.2] Allow Project to host outer references in scalar subqueries #33527

[SPARK-36028][SQL][3.2] Allow Project to host outer references in scalar subqueries #33527

Uh oh!

allisonwang-db commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 27, 2021

Uh oh!

allisonwang-db commented Jul 27, 2021

Uh oh!

SparkQA commented Jul 27, 2021

Uh oh!

SparkQA commented Jul 27, 2021

Uh oh!

cloud-fan commented Jul 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-36028][SQL][3.2] Allow Project to host outer references in scalar subqueries #33527

[SPARK-36028][SQL][3.2] Allow Project to host outer references in scalar subqueries #33527

Uh oh!

Conversation

allisonwang-db commented Jul 26, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 27, 2021

Uh oh!

allisonwang-db commented Jul 27, 2021

Uh oh!

SparkQA commented Jul 27, 2021

Uh oh!

SparkQA commented Jul 27, 2021

Uh oh!

cloud-fan commented Jul 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants