-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27108][SQL] Add parsed SQL plans for create, CTAS. #24029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27108][SQL] Add parsed SQL plans for create, CTAS. #24029
Conversation
This comment has been minimized.
This comment has been minimized.
|
@cloud-fan, this is needed to add the v2 create and CTAS plans. We can get a start while waiting for the catalog identifiers to be committed. |
|
cc @jzhuge, @mccheah, @gatorsmile |
3a77141 to
9bb101f
Compare
This comment has been minimized.
This comment has been minimized.
9bb101f to
9e0913f
Compare
This comment has been minimized.
This comment has been minimized.
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedLogicalPlan.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is useful hierarchy, but if yes we should document more clearly this should not survive analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be useful to add a special check for this, rather than relying on resolved only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - @rdblue these should only be inputs to the analyzer, not outputs. Would be helpful to write specific JavaDoc on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is included above: "Parsed logical plans are not resolved because they must be converted to concrete logical plans."
Do you think that should be rephrased to be more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defer to @rxin but I'm ok with merging with the current docs. We can rephrase in a follow-up if our contributors have trouble with this wording.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTable.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to double check, when did we deprecate CreateTempViewUsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it happened in 5effc01, 3 years ago.
We can always create a parsed plan for CreateTempViewUsing so that it can move to catalyst as well, but I thought that we can do it later, and only need to if this isn't going to be removed in 3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, so we deprecated CREATE TABLE USING, but not CREATE TEMP VIEW USING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review this file carefully, assuming it's just copy-paste code from SparkSqlAstBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is moving what is needed from SparkSqlAstBuilder. The only real changes are in the visitCreateTable method.
These rules should have already been in the abstract class. I'm not sure why they were in SparkSqlAstBuilder other than that was the easiest place to put them when they were added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary to have this parent class just to set the resolved bit? I think we can just put override lazy val resolved = false in the new CreateTable and CreateTableAsSelect classes, with classdoc saying that these 2 classes will be replaced by what concrete plans during analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of this class is that it identifies the set of logical plans that correspond directly to what was parsed from SQL. When someone working on a plan sees ParsedLogicalPlan as an ancestor in Scaladoc, it signals what is explained here: that parser produces ParsedLogicalPlan nodes without translating what was parsed, then those plans get translated into real plans in the analyzer.
With that information, it is easy to see what changes need to be made. If the parsed plan doesn't include an option, then the parser and parsed plan needs to be updated. If it does include an option, then the analyzer and downstream plans need to be updated.
Also keep in mind that this is the first two subclasses of ParsedLogicalPlan. To implement v2 along-side v1, we are going to be adding more of them. So it is valuable that we don't need to remember to set resolved to false in every plan.
|
@cloud-fan, I've responded to the review comments and implemented fixes. Also, I moved the new resolution rules into At a minimum, these needed to be separated into different classes, but I think it is also more correct for resolution rules to run in the resolution batch so that other resolution rules can run on the plans produced by these rules. This is the same reason why we added the I think that the resolution rules in |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Retest this please. |
|
Test build #103578 has finished for PR 24029 at commit
|
|
Test build #103579 has finished for PR 24029 at commit
|
|
retest this please |
|
LGTM if tests pass |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we call it DDLResolution? It's not very related to data source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resolves parsed plans to execution.datasources plans. This isn't just for DDL. We are starting with CreateTable and CreateTableAsSelect, but there will be more parsed plans that get converted to datasource plans in this rule. That's why I think DataSourceResolution is an appropriate name.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTable.scala
Outdated
Show resolved
Hide resolved
* Make ParsedLogicalPlan.resovled final. * Add docs to CreateTable to be clear that it is metadata-only.
* Move CreateTable and CreateTableAsSelect tests to catalyst * Add PlanResolutionSuite to test parsing and resolution
65c5cd5 to
5054334
Compare
|
@mccheah, @cloud-fan, I've rebased on master, fixed a minor conflict, and added the |
This comment has been minimized.
This comment has been minimized.
5054334 to
6c9b9dc
Compare
|
Test build #103831 has finished for PR 24029 at commit
|
|
thanks, merging to master! |
|
Thanks @cloud-fan! And thanks to all the reviewers also! |
This moves parsing `CREATE TABLE ... USING` statements into catalyst. Catalyst produces logical plans with the parsed information and those plans are converted to v1 `DataSource` plans in `DataSourceAnalysis`. This prepares for adding v2 create plans that should receive the information parsed from SQL without being translated to v1 plans first. This also makes it possible to parse in catalyst instead of breaking the parser across the abstract `AstBuilder` in catalyst and `SparkSqlParser` in core. For more information, see the [mailing list thread](https://lists.apache.org/thread.html/54f4e1929ceb9a2b0cac7cb058000feb8de5d6c667b2e0950804c613%3Cdev.spark.apache.org%3E). This uses existing tests to catch regressions. This introduces no behavior changes. Closes apache#24029 from rdblue/SPARK-27108-add-parsed-create-logical-plans. Authored-by: Ryan Blue <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS. * Create parsed logical plans * Move parsing logic into Catalyst's AstBuilder * Convert to DataSource plans in DataSourceResolution * Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant * Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270) * Parse `ALTER TABLE ... RENAME COLUMN ... TO ...` * Parse `ALTER TABLE ... DROP COLUMNS ...` * Added new tests in Catalyst's `DDLParserSuite` * Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite` * Existing tests for regressions Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst. Authored-by: Ryan Blue <[email protected]> Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS. * Create parsed logical plans * Move parsing logic into Catalyst's AstBuilder * Convert to DataSource plans in DataSourceResolution * Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant * Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270) * Parse `ALTER TABLE ... RENAME COLUMN ... TO ...` * Parse `ALTER TABLE ... DROP COLUMNS ...` ## How was this patch tested? * Added new tests in Catalyst's `DDLParserSuite` * Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite` * Existing tests for regressions Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst. Authored-by: Ryan Blue <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
This moves parsing
CREATE TABLE ... USINGstatements into catalyst. Catalyst produces logical plans with the parsed information and those plans are converted to v1DataSourceplans inDataSourceAnalysis.This prepares for adding v2 create plans that should receive the information parsed from SQL without being translated to v1 plans first.
This also makes it possible to parse in catalyst instead of breaking the parser across the abstract
AstBuilderin catalyst andSparkSqlParserin core.For more information, see the mailing list thread.
How was this patch tested?
This uses existing tests to catch regressions. This introduces no behavior changes.