Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented May 27, 2019

What changes were proposed in this pull request?

This moves parsing logic for ALTER TABLE into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR #24029, that created parsed logical plans for create and CTAS.

  • Create parsed logical plans
  • Move parsing logic into Catalyst's AstBuilder
  • Convert to DataSource plans in DataSourceResolution
  • Parse ALTER TABLE ... SET LOCATION ... separately from the partition variant
  • Parse ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment] as discussed on the dev list
  • Parse ALTER TABLE ... RENAME COLUMN ... TO ...
  • Parse ALTER TABLE ... DROP COLUMNS ...

How was this patch tested?

  • Added new tests in Catalyst's DDLParserSuite
  • Moved converted plan tests from SQL DDLParserSuite to PlanResolutionSuite
  • Existing tests for regressions

@rdblue rdblue changed the title [SPARK-27857] Move ALTER TABLE parsing into Catalyst [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst May 27, 2019
@rdblue rdblue force-pushed the SPARK-27857-add-alter-table-statements-in-catalyst branch from 708fbdb to 36a2bcd Compare May 27, 2019 22:18
@SparkQA
Copy link

SparkQA commented May 27, 2019

Test build #105848 has finished for PR 24723 at commit 708fbdb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class NewColumn(name: Seq[String], dataType: DataType, comment: Option[String])
  • case class AlterTableAddColumnsStatement(
  • case class AlterTableAlterColumnStatement(
  • case class AlterTableRenameColumnStatement(
  • case class AlterTableDropColumnsStatement(
  • case class AlterTableSetPropertiesStatement(
  • case class AlterTableUnsetPropertiesStatement(
  • case class AlterTableSetLocationStatement(
  • case class AlterViewSetPropertiesStatement(
  • case class AlterViewUnsetPropertiesStatement(

@SparkQA
Copy link

SparkQA commented May 27, 2019

Test build #105849 has finished for PR 24723 at commit 36a2bcd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class NewColumn(name: Seq[String], dataType: DataType, comment: Option[String])
  • case class AlterTableAddColumnsStatement(
  • case class AlterTableAlterColumnStatement(
  • case class AlterTableRenameColumnStatement(
  • case class AlterTableDropColumnsStatement(
  • case class AlterTableSetPropertiesStatement(
  • case class AlterTableUnsetPropertiesStatement(
  • case class AlterTableSetLocationStatement(
  • case class AlterViewSetPropertiesStatement(
  • case class AlterViewUnsetPropertiesStatement(

@rdblue
Copy link
Contributor Author

rdblue commented May 29, 2019

Retest this please.

@SparkQA
Copy link

SparkQA commented May 29, 2019

Test build #105923 has finished for PR 24723 at commit 36a2bcd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class NewColumn(name: Seq[String], dataType: DataType, comment: Option[String])
  • case class AlterTableAddColumnsStatement(
  • case class AlterTableAlterColumnStatement(
  • case class AlterTableRenameColumnStatement(
  • case class AlterTableDropColumnsStatement(
  • case class AlterTableSetPropertiesStatement(
  • case class AlterTableUnsetPropertiesStatement(
  • case class AlterTableSetLocationStatement(
  • case class AlterViewSetPropertiesStatement(
  • case class AlterViewUnsetPropertiesStatement(

@SparkQA
Copy link

SparkQA commented May 30, 2019

Test build #105934 has finished for PR 24723 at commit a4f56b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor Author

rdblue commented May 30, 2019

@cloud-fan, @mccheah, @dongjoon-hyun, @jzhuge, could you review this PR? It updates SQL parsing for ALTER TABLE like the recent changes for CREATE TABLE and DROP TABLE. The parser is updated as discussed on the dev list.

@rdblue rdblue force-pushed the SPARK-27857-add-alter-table-statements-in-catalyst branch from a4f56b8 to 84be6d2 Compare May 30, 2019 19:41
Copy link
Member

@jzhuge jzhuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except just a few minors.

@SparkQA
Copy link

SparkQA commented May 30, 2019

Test build #105975 has finished for PR 24723 at commit 84be6d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked other databases/SQL standard that the parentheses can be omitted in the ALTER TABLE statement?

Copy link
Contributor Author

@rdblue rdblue May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. PostgreSQL, MySQL, and Oracle support this for a single column. PosgreSQL doesn't allow multiple columns. MySQL allows multiple columns without parens. And Oracle requires parens to add multiple columns. These don't assist parsing, which is why I think it is better to optionally support them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like DB2 and SQL server also allow adding multiple columns without parens.

@SparkQA
Copy link

SparkQA commented Jun 1, 2019

Test build #106033 has finished for PR 24723 at commit 4902420.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 1, 2019

Test build #106039 has finished for PR 24723 at commit 40c94dd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add it to ansiNonReserved.

BTW is this a non-reserved keyword? @maropu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an ANSI reserved keyword, but it is reserved in MySQL, PostgreSQL, and DB2. It is also on the list of potential ANSI reserved keywords.

@cloud-fan
Copy link
Contributor

LGTM

@gatorsmile
Copy link
Member

Normally, we check if it is possible to split it into multiple smaller, logical PRs or commits that can be quick to review.

  • Keep refactors in separate PRs
  • Fix one issue/bug per PR.

@ueshin Please help review this PR and ensure these DDLs are well tested, especially when handling nested columns.

@rdblue rdblue force-pushed the SPARK-27857-add-alter-table-statements-in-catalyst branch from f6c9005 to 7e2369e Compare June 3, 2019 16:49
@SparkQA
Copy link

SparkQA commented Jun 3, 2019

Test build #106116 has finished for PR 24723 at commit 7e2369e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class QualifiedColType(name: Seq[String], dataType: DataType, comment: Option[String])

@SparkQA
Copy link

SparkQA commented Jun 3, 2019

Test build #106117 has finished for PR 24723 at commit c558499.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


colPosition
: FIRST | AFTER identifier
: FIRST | AFTER qualifiedName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to modify this? I'm not sure what happens if something like:

ALTER TABLE tbl ADD a.b.c ... AFTER x.y

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is needed because columns are identified by qualifiedName. The parser shouldn't fail if a nested column name is used. Instead, analysis should catch problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also add that it is valid to reorder nested fields:

ALTER TABLE tbl ADD point.z bigint AFTER point.y

override def visitAlterTableColumn(
ctx: AlterTableColumnContext): LogicalPlan = withOrigin(ctx) {
if (ctx.colPosition != null) {
operationNotAllowed("ALTER TABLE table CHANGE COLUMN ... FIRST | AFTER otherCol", ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALTER COLUMN instead of CHANGE COLUMN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are supported, but CHANGE was supported first and is used in other places. I used CHANGE For consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this to use the same verb that was parsed.

(ALTER | CHANGE) COLUMN? qualifiedName
(TYPE dataType)? (COMMENT comment=STRING)? colPosition? #alterTableColumn
| ALTER TABLE tableIdentifier partitionSpec?
CHANGE COLUMN? identifier colType colPosition? #changeColumn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we handle this version of CHANGE COLUMN command?
Should we merge this in the (ALTER | CHANGE) COLUMN command above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to discuss deprecating this form of the command, which is why I haven't updated it.

The problem is that this requires the user to specify too much of a column's metadata when it isn't changing. For example, if I'm updating an int to a long, I also need to specify that it should have the same name (UPDATE a a BIGINT). Similarly, to rename you have to pass the type back in (UPDATE a b INT). This can easily lead to unintended changes that can't be reverted, like widening a type accidentally.

I think that this form of the command should not be supported in v2. We can decide that later because all this is doing is updating the parser to add commands that we need to support.

case AlterTableAddColumnsStatement(AsTableIdentifier(table), newColumns)
if newColumns.forall(_.name.size == 1) =>
// only top-level adds are supported using AlterTableAddColumnsCommand
AlterTableAddColumnsCommand(table, newColumns.map(convertToStructField))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlterTableAlterColumnStatement is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing implements AlterTableAlterColumnStatement. This PR preserves existing behavior and updates the parser, it does not add new behavior.

comparePlans(parsed2, expected2)
}

test("alter table: change column name/type/comment") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be preserved in PlanResolutionSuite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, looks like I missed this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser didn't change for this variant of CHANGE COLUMN, so I'm adding it back. Looks like I just removed it accidentally.

@rdblue
Copy link
Contributor Author

rdblue commented Jun 4, 2019

Thanks for reviewing this, @ueshin. I've addressed the problems you found. Please have another look.

@SparkQA
Copy link

SparkQA commented Jun 4, 2019

Test build #106163 has finished for PR 24723 at commit bd7d8c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Jun 5, 2019

LGTM.

@rdblue
Copy link
Contributor Author

rdblue commented Jun 5, 2019

@dongjoon-hyun, could you have another look and possibly merge? I think the review items have been addressed. Thank you!

@rdblue
Copy link
Contributor Author

rdblue commented Jun 5, 2019

Thanks for reviewing this, @ueshin!

@gatorsmile
Copy link
Member

cc @yeshengm This PR will affect your parser changes. I will merge this PR first and you can address the issues in your PR.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@gatorsmile gatorsmile closed this in 5d6758c Jun 5, 2019
@rdblue
Copy link
Contributor Author

rdblue commented Jun 5, 2019

Thank you for merging this, @gatorsmile!

mccheah pushed a commit to palantir/spark that referenced this pull request Jun 6, 2019
This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS.

* Create parsed logical plans
* Move parsing logic into Catalyst's AstBuilder
* Convert to DataSource plans in DataSourceResolution
* Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant
* Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270)
* Parse `ALTER TABLE ... RENAME COLUMN ... TO ...`
* Parse `ALTER TABLE ... DROP COLUMNS ...`

* Added new tests in Catalyst's `DDLParserSuite`
* Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite`
* Existing tests for regressions

Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst.

Authored-by: Ryan Blue <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
## What changes were proposed in this pull request?

This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS.

* Create parsed logical plans
* Move parsing logic into Catalyst's AstBuilder
* Convert to DataSource plans in DataSourceResolution
* Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant
* Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270)
* Parse `ALTER TABLE ... RENAME COLUMN ... TO ...`
* Parse `ALTER TABLE ... DROP COLUMNS ...`

## How was this patch tested?

* Added new tests in Catalyst's `DDLParserSuite`
* Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite`
* Existing tests for regressions

Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst.

Authored-by: Ryan Blue <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
@rdblue rdblue deleted the SPARK-27857-add-alter-table-statements-in-catalyst branch June 21, 2019 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants