-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst #24723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst #24723
Conversation
708fbdb to
36a2bcd
Compare
|
Test build #105848 has finished for PR 24723 at commit
|
|
Test build #105849 has finished for PR 24723 at commit
|
|
Retest this please. |
|
Test build #105923 has finished for PR 24723 at commit
|
|
Test build #105934 has finished for PR 24723 at commit
|
|
@cloud-fan, @mccheah, @dongjoon-hyun, @jzhuge, could you review this PR? It updates SQL parsing for |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
a4f56b8 to
84be6d2
Compare
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
jzhuge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except just a few minors.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
|
Test build #105975 has finished for PR 24723 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked other databases/SQL standard that the parentheses can be omitted in the ALTER TABLE statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. PostgreSQL, MySQL, and Oracle support this for a single column. PosgreSQL doesn't allow multiple columns. MySQL allows multiple columns without parens. And Oracle requires parens to add multiple columns. These don't assist parsing, which is why I think it is better to optionally support them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like DB2 and SQL server also allow adding multiple columns without parens.
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
|
Test build #106033 has finished for PR 24723 at commit
|
|
Test build #106039 has finished for PR 24723 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also add it to ansiNonReserved.
BTW is this a non-reserved keyword? @maropu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an ANSI reserved keyword, but it is reserved in MySQL, PostgreSQL, and DB2. It is also on the list of potential ANSI reserved keywords.
|
LGTM |
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
|
Normally, we check if it is possible to split it into multiple smaller, logical PRs or commits that can be quick to review.
@ueshin Please help review this PR and ensure these DDLs are well tested, especially when handling nested columns. |
f6c9005 to
7e2369e
Compare
|
Test build #106116 has finished for PR 24723 at commit
|
|
Test build #106117 has finished for PR 24723 at commit
|
|
|
||
| colPosition | ||
| : FIRST | AFTER identifier | ||
| : FIRST | AFTER qualifiedName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to modify this? I'm not sure what happens if something like:
ALTER TABLE tbl ADD a.b.c ... AFTER x.y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is needed because columns are identified by qualifiedName. The parser shouldn't fail if a nested column name is used. Instead, analysis should catch problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also add that it is valid to reorder nested fields:
ALTER TABLE tbl ADD point.z bigint AFTER point.y
| override def visitAlterTableColumn( | ||
| ctx: AlterTableColumnContext): LogicalPlan = withOrigin(ctx) { | ||
| if (ctx.colPosition != null) { | ||
| operationNotAllowed("ALTER TABLE table CHANGE COLUMN ... FIRST | AFTER otherCol", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALTER COLUMN instead of CHANGE COLUMN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are supported, but CHANGE was supported first and is used in other places. I used CHANGE For consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update this to use the same verb that was parsed.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
| (ALTER | CHANGE) COLUMN? qualifiedName | ||
| (TYPE dataType)? (COMMENT comment=STRING)? colPosition? #alterTableColumn | ||
| | ALTER TABLE tableIdentifier partitionSpec? | ||
| CHANGE COLUMN? identifier colType colPosition? #changeColumn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we handle this version of CHANGE COLUMN command?
Should we merge this in the (ALTER | CHANGE) COLUMN command above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to discuss deprecating this form of the command, which is why I haven't updated it.
The problem is that this requires the user to specify too much of a column's metadata when it isn't changing. For example, if I'm updating an int to a long, I also need to specify that it should have the same name (UPDATE a a BIGINT). Similarly, to rename you have to pass the type back in (UPDATE a b INT). This can easily lead to unintended changes that can't be reverted, like widening a type accidentally.
I think that this form of the command should not be supported in v2. We can decide that later because all this is doing is updating the parser to add commands that we need to support.
| case AlterTableAddColumnsStatement(AsTableIdentifier(table), newColumns) | ||
| if newColumns.forall(_.name.size == 1) => | ||
| // only top-level adds are supported using AlterTableAddColumnsCommand | ||
| AlterTableAddColumnsCommand(table, newColumns.map(convertToStructField)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AlterTableAlterColumnStatement is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing implements AlterTableAlterColumnStatement. This PR preserves existing behavior and updates the parser, it does not add new behavior.
| comparePlans(parsed2, expected2) | ||
| } | ||
|
|
||
| test("alter table: change column name/type/comment") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should be preserved in PlanResolutionSuite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, looks like I missed this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser didn't change for this variant of CHANGE COLUMN, so I'm adding it back. Looks like I just removed it accidentally.
|
Thanks for reviewing this, @ueshin. I've addressed the problems you found. Please have another look. |
|
Test build #106163 has finished for PR 24723 at commit
|
|
LGTM. |
|
@dongjoon-hyun, could you have another look and possibly merge? I think the review items have been addressed. Thank you! |
|
Thanks for reviewing this, @ueshin! |
|
cc @yeshengm This PR will affect your parser changes. I will merge this PR first and you can address the issues in your PR. |
|
Thanks! Merged to master. |
|
Thank you for merging this, @gatorsmile! |
This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS. * Create parsed logical plans * Move parsing logic into Catalyst's AstBuilder * Convert to DataSource plans in DataSourceResolution * Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant * Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270) * Parse `ALTER TABLE ... RENAME COLUMN ... TO ...` * Parse `ALTER TABLE ... DROP COLUMNS ...` * Added new tests in Catalyst's `DDLParserSuite` * Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite` * Existing tests for regressions Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst. Authored-by: Ryan Blue <[email protected]> Signed-off-by: gatorsmile <[email protected]>
## What changes were proposed in this pull request? This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS. * Create parsed logical plans * Move parsing logic into Catalyst's AstBuilder * Convert to DataSource plans in DataSourceResolution * Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant * Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270) * Parse `ALTER TABLE ... RENAME COLUMN ... TO ...` * Parse `ALTER TABLE ... DROP COLUMNS ...` ## How was this patch tested? * Added new tests in Catalyst's `DDLParserSuite` * Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite` * Existing tests for regressions Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst. Authored-by: Ryan Blue <[email protected]> Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
This moves parsing logic for
ALTER TABLEinto Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR #24029, that created parsed logical plans for create and CTAS.ALTER TABLE ... SET LOCATION ...separately from the partition variantALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]as discussed on the dev listALTER TABLE ... RENAME COLUMN ... TO ...ALTER TABLE ... DROP COLUMNS ...How was this patch tested?
DDLParserSuiteDDLParserSuitetoPlanResolutionSuite