-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27909][SQL] Do not run analysis inside CTE substitution #24763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #106032 has finished for PR 24763 at commit
|
|
@cloud-fan, @dongjoon-hyun, @mccheah can you take a look at this? These changes are needed to get the DSv2 table resolution fixed because that adds a second rule to resolve relations. That means that |
|
@jzhuge, FYI |
|
looks reasonable to me, but not very familiar with this part, cc @gatorsmile @hvanhovell |
|
LGTM. This PR not only fixes cte.sql failures we encountered while enhancing #24741, it also makes the code much easier to reason about. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
I agree with the above comments on |
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIRA https://issues.apache.org/jira/browse/SPARK-27909 said it is a bug fix. I think we already fixed the infinite recursion in #14397.
Is this PR just a refactoring?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
|
Test build #106114 has finished for PR 24763 at commit
|
|
Test build #106158 has finished for PR 24763 at commit
|
|
@dongjoon-hyun, I think we have addressed all review comments, could you have another look? Thank you! |
|
LGTM Merged to master. |
|
Thanks for merging this, @gatorsmile! |
|
Late LGTM! Thank you for merging, too. |
## What changes were proposed in this pull request? This updates CTE substitution to avoid needing to run all resolution rules on each substituted expression. Running resolution rules was previously used to avoid infinite recursion. In the updated rule, CTE plans are substituted as sub-queries from right to left. Using this scope-based order, it is not necessary to replace multiple CTEs at the same time using `resolveOperatorsDown`. Instead, `resolveOperatorsUp` is used to replace each CTE individually. By resolving using `resolveOperatorsUp`, this no longer needs to run all analyzer rules on each substituted expression. Previously, this was done to apply `ResolveRelations`, which would throw an `AnalysisException` for all unresolved relations so that unresolved relations that may cause recursive substitutions were not left in the plan. Because this is no longer needed, `ResolveRelations` no longer needs to throw `AnalysisException` and resolution can be done in multiple rules. ## How was this patch tested? Existing tests in `SQLQueryTestSuite`, `cte.sql`. Closes apache#24763 from rdblue/SPARK-27909-fix-cte-substitution. Authored-by: Ryan Blue <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
This updates CTE substitution to avoid needing to run all resolution rules on each substituted expression. Running resolution rules was previously used to avoid infinite recursion. In the updated rule, CTE plans are substituted as sub-queries from right to left. Using this scope-based order, it is not necessary to replace multiple CTEs at the same time using
resolveOperatorsDown. Instead,resolveOperatorsUpis used to replace each CTE individually.By resolving using
resolveOperatorsUp, this no longer needs to run all analyzer rules on each substituted expression. Previously, this was done to applyResolveRelations, which would throw anAnalysisExceptionfor all unresolved relations so that unresolved relations that may cause recursive substitutions were not left in the plan. Because this is no longer needed,ResolveRelationsno longer needs to throwAnalysisExceptionand resolution can be done in multiple rules.How was this patch tested?
Existing tests in
SQLQueryTestSuite,cte.sql.