-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16771][SQL] WITH clause should not fall into infinite loop. #14397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #62995 has finished for PR 14397 at commit
|
|
Hi, @rxin . |
|
@dongjoon-hyun I think this has merit. I do have one question, what do other databases do? Like postgresql, mysql, sqlserver and others? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This integrates some of the functionality of the ResolveRelations rule, but not all (for instance file based datasources). Shouldn't we be consistent and (perhaps) integrate these rules?
|
Thank you for review, @hvanhovell . For the recursive CTE queries, traditional DBMS supports optional For the overlap of
For Hive, the recursive queries and |
New behavior versus existing systemsI was not talking about recursive CTE's (which can be very useful in some cases). We are changing the behavior of the Analyzer, and this might surprise users. I would like to know if there is a common approach to this among other systems; so we can justify the change in behavior. Resolve RelationsThe change you propose is almost subsuming the |
|
Oh, sorry for misunderstanding, @hvanhovell . I think we can integrate |
|
Hi, @hvanhovell . It seems not clearly documented, so I did some comparisons. First of all, in the main SQL body, CTE query names are used first. However, the table resolution in the main SQL body is also dependent on the CTE subqueries. The general Analyzer approach seems to
The root cause of previous Spark problems is using |
|
The above approach also can remove the duplicated scope issue between |
|
Hi, @hvanhovell . |
|
Test build #63371 has finished for PR 14397 at commit
|
|
Could you review this PR again, @hvanhovell ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the CTEs of WITH clause are resolved sequentially before applying to main SQL body.
|
@dongjoon-hyun I'll take a look in the morning (CET) |
|
Thank you so much! Then, see you later. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relations is a map. The iteration sequence of a map does not need the be the ordered in which elements were added (Use a map larger than 4 to see this in action, e.g.: Seq.tabulate(5)(i => ('a' + i).toChar.toString -> i).toMap).
So I think we need to change the data type in With to accommodate this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I changed it to Seq. Now, relations of With is a sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind you have already done that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not replace With in all cases. Could you remove the relations.nonEmpty guard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If WITH without relations, I thought we can skip all since there is nothing to be replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right about that. But the current code does not remove the With node if there are no relations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see!
|
@dongjoon-hyun this is looking very promising. I left two small comments. |
|
Thank you, @hvanhovell ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved :+ r._1 -> ResolveRelations(substituteCTE(r._2, resolved))?
You could also deconstruct the r tuple for easier reading.
|
LGTM pending Jenkins. |
|
Test build #63526 has finished for PR 14397 at commit
|
|
Test build #63529 has finished for PR 14397 at commit
|
|
Rebased just to resolve conflicts. |
|
Test build #63577 has finished for PR 14397 at commit
|
|
Test build #63578 has finished for PR 14397 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you create a test in SQLQueryTestSuite instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll move this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, @rxin .
SQLQueryTestSuite seems not to support exceptions cases yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, exceptions are important and should be checked.
|
Hi, @rxin . |
|
Test build #63595 has finished for PR 14397 at commit
|
|
Hi, @rxin . I moved the testcase into new suite. |
| @@ -0,0 +1,14 @@ | |||
| create temporary view t as select * from values 0, 1, 2 as t(id); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe CTE instead of "with" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, t and t2 is base tables.
The followings are used CTEs and to check if base table or previous CTE is used correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. You mean with.sql into CTE.sql.
I see. No problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use cte.sql instead of with.sql.
|
Thank you, @rxin . It's updated. |
|
Test build #63637 has finished for PR 14397 at commit
|
|
Hi, @hvanhovell . |
|
LGTM (again) - merging to master. Thanks! |
|
Thank you, @hvanhovell ! |
What changes were proposed in this pull request?
This PR changes the CTE resolving rule to use only forward-declared tables in order to prevent infinite loops. More specifically, new logic is like the following.
WITHclauses first before replacing the main SQL body.Reported Error Scenarios
Note that
t,t1, andt2are not declared in database. Spark falls into infinite loops before resolving table names.How was this patch tested?
Pass the Jenkins tests with new two testcases.