-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-36028][SQL][3.2] Allow Project to host outer references in scalar subqueries #33527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #141664 has finished for PR 33527 at commit
|
|
cc @cloud-fan (The test failures seem to be related to SPARK-35985?) |
…ubqueries This PR allows the `Project` node to host outer references in scalar subqueries when `decorrelateInnerQuery` is enabled. It is already supported by the new decorrelation framework and the `RewriteCorrelatedScalarSubquery` rule. Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider `SELECT (SELECT c1) FROM t`. This should be optimized as `SELECT c1 FROM t` instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation. To allow more types of correlated scalar subqueries. Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example: ```sql SELECT (SELECT c1) FROM t; ``` Before this change: ``` org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses ``` After this change: ``` +------------------+ |scalarsubquery(c1)| +------------------+ |0 | |1 | +------------------+ ``` Added unit tests and SQL tests. Closes apache#33235 from allisonwang-db/spark-36028-outer-in-project. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit ca348e5) Signed-off-by: allisonwang-db <[email protected]>
fb58c1c to
29b068c
Compare
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #141727 has finished for PR 33527 at commit
|
|
thanks, merging to 3.2! |
…lar subqueries This PR cherry picks #33235 to branch-3.2 to fix test failures introduced by #33284. ### What changes were proposed in this pull request? This PR allows the `Project` node to host outer references in scalar subqueries when `decorrelateInnerQuery` is enabled. It is already supported by the new decorrelation framework and the `RewriteCorrelatedScalarSubquery` rule. Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider `SELECT (SELECT c1) FROM t`. This should be optimized as `SELECT c1 FROM t` instead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation. ### Why are the changes needed? To allow more types of correlated scalar subqueries. ### Does this PR introduce _any_ user-facing change? Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example: ```sql SELECT (SELECT c1) FROM t; ``` Before this change: ``` org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses ``` After this change: ``` +------------------+ |scalarsubquery(c1)| +------------------+ |0 | |1 | +------------------+ ``` ### How was this patch tested? Added unit tests and SQL tests. (cherry picked from commit ca348e5) Signed-off-by: allisonwang-db <allison.wangdatabricks.com> Closes #33527 from allisonwang-db/spark-36028-3.2. Authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This PR cherry picks #33235 to branch-3.2 to fix test failures introduced by #33284.
What changes were proposed in this pull request?
This PR allows the
Projectnode to host outer references in scalar subqueries whendecorrelateInnerQueryis enabled. It is already supported by the new decorrelation framework and theRewriteCorrelatedScalarSubqueryrule.Note currently by default all correlated subqueries will be decorrelated, which is not necessarily the most optimal approach. Consider
SELECT (SELECT c1) FROM t. This should be optimized asSELECT c1 FROM tinstead of rewriting it as a left outer join. This will be done in a separate PR to optimize correlated scalar/lateral subqueries with OneRowRelation.Why are the changes needed?
To allow more types of correlated scalar subqueries.
Does this PR introduce any user-facing change?
Yes. This PR allows outer query column references in the SELECT cluase of a correlated scalar subquery. For example:
Before this change:
After this change:
How was this patch tested?
Added unit tests and SQL tests.
(cherry picked from commit ca348e5)
Signed-off-by: allisonwang-db [email protected]