Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented May 31, 2019

What changes were proposed in this pull request?

This updates CTE substitution to avoid needing to run all resolution rules on each substituted expression. Running resolution rules was previously used to avoid infinite recursion. In the updated rule, CTE plans are substituted as sub-queries from right to left. Using this scope-based order, it is not necessary to replace multiple CTEs at the same time using resolveOperatorsDown. Instead, resolveOperatorsUp is used to replace each CTE individually.

By resolving using resolveOperatorsUp, this no longer needs to run all analyzer rules on each substituted expression. Previously, this was done to apply ResolveRelations, which would throw an AnalysisException for all unresolved relations so that unresolved relations that may cause recursive substitutions were not left in the plan. Because this is no longer needed, ResolveRelations no longer needs to throw AnalysisException and resolution can be done in multiple rules.

How was this patch tested?

Existing tests in SQLQueryTestSuite, cte.sql.

@rdblue rdblue changed the title SPARK-27909: Update CTE substitution to be independent. [SPARK-27909][SQL] Do not run analysis inside CTE substitution May 31, 2019
@SparkQA
Copy link

SparkQA commented May 31, 2019

Test build #106032 has finished for PR 24763 at commit 4b140f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor Author

rdblue commented May 31, 2019

@cloud-fan, @dongjoon-hyun, @mccheah can you take a look at this?

These changes are needed to get the DSv2 table resolution fixed because that adds a second rule to resolve relations. That means that ResolveRelations should not throw AnalysisException if it doesn't find a table because another resolution rule may be able to find the table.

@rdblue
Copy link
Contributor Author

rdblue commented May 31, 2019

@jzhuge, FYI

@cloud-fan
Copy link
Contributor

looks reasonable to me, but not very familiar with this part, cc @gatorsmile @hvanhovell

@jzhuge
Copy link
Member

jzhuge commented Jun 1, 2019

LGTM. This PR not only fixes cte.sql failures we encountered while enhancing #24741, it also makes the code much easier to reason about.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member

I agree with the above comments on foldRight and the rest of PR also looks good to me.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIRA https://issues.apache.org/jira/browse/SPARK-27909 said it is a bug fix. I think we already fixed the infinite recursion in #14397.

Is this PR just a refactoring?

@SparkQA
Copy link

SparkQA commented Jun 3, 2019

Test build #106114 has finished for PR 24763 at commit 32317ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106158 has finished for PR 24763 at commit 6ae409d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor Author

rdblue commented Jun 4, 2019

@dongjoon-hyun, I think we have addressed all review comments, could you have another look? Thank you!

@gatorsmile
Copy link
Member

LGTM

Merged to master.

@gatorsmile gatorsmile closed this in de73a54 Jun 4, 2019
@rdblue
Copy link
Contributor Author

rdblue commented Jun 4, 2019

Thanks for merging this, @gatorsmile!

@dongjoon-hyun
Copy link
Member

Late LGTM! Thank you for merging, too.

emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
## What changes were proposed in this pull request?

This updates CTE substitution to avoid needing to run all resolution rules on each substituted expression. Running resolution rules was previously used to avoid infinite recursion. In the updated rule, CTE plans are substituted as sub-queries from right to left. Using this scope-based order, it is not necessary to replace multiple CTEs at the same time using `resolveOperatorsDown`. Instead, `resolveOperatorsUp` is used to replace each CTE individually.

By resolving using `resolveOperatorsUp`, this no longer needs to run all analyzer rules on each substituted expression. Previously, this was done to apply `ResolveRelations`, which would throw an `AnalysisException` for all unresolved relations so that unresolved relations that may cause recursive substitutions were not left in the plan. Because this is no longer needed, `ResolveRelations` no longer needs to throw `AnalysisException` and resolution can be done in multiple rules.

## How was this patch tested?

Existing tests in `SQLQueryTestSuite`, `cte.sql`.

Closes apache#24763 from rdblue/SPARK-27909-fix-cte-substitution.

Authored-by: Ryan Blue <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants