Skip to content

Conversation

@allisonwang-db
Copy link
Contributor

This PR cherry picks #33235 to branch-3.2 to fix test failures introduced by #33284.

What changes were proposed in this pull request?

This PR allows the Project node to host outer references in scalar subqueries when decorrelateInnerQuery is enabled. It is already supported by the new decorrelation framework and the RewriteCorrelatedScalarSubquery rule.

Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider SELECT (SELECT c1) FROM t. This should be optimized as SELECT c1 FROM t instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation.

Why are the changes needed?

To allow more types of correlated scalar subqueries.

Does this PR introduce any user-facing change?

Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example:

SELECT (SELECT c1) FROM t;

Before this change:

org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported
outside of WHERE/HAVING clauses

After this change:

+------------------+
|scalarsubquery(c1)|
+------------------+
|0                 |
|1                 |
+------------------+

How was this patch tested?

Added unit tests and SQL tests.

(cherry picked from commit ca348e5)
Signed-off-by: allisonwang-db [email protected]

@github-actions github-actions bot added the SQL label Jul 26, 2021
@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46180/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46180/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Test build #141664 has finished for PR 33527 at commit fb58c1c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@allisonwang-db
Copy link
Contributor Author

cc @cloud-fan (The test failures seem to be related to SPARK-35985?)

…ubqueries

This PR allows the `Project` node to host outer references in scalar subqueries when `decorrelateInnerQuery` is enabled. It is already supported by the new decorrelation framework and the `RewriteCorrelatedScalarSubquery` rule.

Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider `SELECT (SELECT c1) FROM t`. This should be optimized as `SELECT c1 FROM t` instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation.

To allow more types of correlated scalar subqueries.

Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example:
```sql
SELECT (SELECT c1) FROM t;
```
Before this change:
```
org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported
outside of WHERE/HAVING clauses
```

After this change:
```
+------------------+
|scalarsubquery(c1)|
+------------------+
|0                 |
|1                 |
+------------------+
```

Added unit tests and SQL tests.

Closes apache#33235 from allisonwang-db/spark-36028-outer-in-project.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit ca348e5)
Signed-off-by: allisonwang-db <[email protected]>
@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46240/

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Test build #141727 has finished for PR 33527 at commit 29b068c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to 3.2!

cloud-fan pushed a commit that referenced this pull request Jul 28, 2021
…lar subqueries

This PR cherry picks #33235 to branch-3.2 to fix test failures introduced by #33284.

### What changes were proposed in this pull request?
This PR allows the `Project` node to host outer references in scalar subqueries when `decorrelateInnerQuery` is enabled. It is already supported by the new decorrelation framework and the `RewriteCorrelatedScalarSubquery` rule.

Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider `SELECT (SELECT c1) FROM t`. This should be optimized as `SELECT c1 FROM t` instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation.

### Why are the changes needed?
To allow more types of correlated scalar subqueries.

### Does this PR introduce _any_ user-facing change?
Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example:
```sql
SELECT (SELECT c1) FROM t;
```
Before this change:
```
org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported
outside of WHERE/HAVING clauses
```

After this change:
```
+------------------+
|scalarsubquery(c1)|
+------------------+
|0                 |
|1                 |
+------------------+
```

### How was this patch tested?
Added unit tests and SQL tests.

(cherry picked from commit ca348e5)
Signed-off-by: allisonwang-db <allison.wangdatabricks.com>

Closes #33527 from allisonwang-db/spark-36028-3.2.

Authored-by: allisonwang-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants