Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

This PR is to provide a native support for DDL DROP VIEW and DROP TABLE. The PR includes native parsing and native analysis.

Based on the HIVE DDL document for [DROP_VIEW_WEB_LINK](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-
DropView), DROP VIEW is defined as,
Syntax:

DROP VIEW [IF EXISTS] [db_name.]view_name;
  • to remove metadata for the specified view.
  • illegal to use DROP TABLE on a view.
  • illegal to use DROP VIEW on a table.
  • this command only works in HiveContext. In SQLContext, we will get an exception.

This PR also handles DROP TABLE.
Syntax:

DROP TABLE [IF EXISTS] table_name [PURGE];
  • Previously, the DROP TABLE command only can drop Hive tables in HiveContext. Now, after this PR, this command also can drop temporary table, external table, external data source table in SQLContext.
  • In HiveContext, we will not issue an exception if the to-be-dropped table does not exist and users did not specify IF EXISTS. Instead, we just log an error message. If IF EXISTS is specified, we will not issue any error message/exception.
  • In SQLContext, we will issue an exception if the to-be-dropped table does not exist, unless IF EXISTS is specified.
  • Data will not be deleted if the tables are external, unless table type is managed_table.

How was this patch tested?

For verifying command parsing, added test cases in spark/sql/hive/HiveDDLCommandSuite.scala
For verifying command analysis, added test cases in spark/sql/hive/execution/HiveDDLSuite.scala

gatorsmile and others added 30 commits November 13, 2015 14:50
@gatorsmile
Copy link
Member Author

Sure, will do it.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55392 has finished for PR 12146 at commit f00cbea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@yhuai Actually, when we drop the external table, we do delete the data.

If we do not want to delete the data, we need to change the following value from true to false, which is the @param deleteData (i.e., deletes the underlying data along with metadata)
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L345

Update: Sorry, my case is wrong.

create external table $tabName(c1 int COMMENT 'abc', c2 string)
stored as parquet
location '$tmpDir'
as select 1, '3'

Although we use the EXTERNAL keyword, the table is still managed_table. The above statement could be wrong. As you pointed out in another PR, the data in an external table should not be deleted.
https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2239-L2251

@SparkQA
Copy link

SparkQA commented Apr 9, 2016

Test build #55424 has finished for PR 12146 at commit f280ec4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 9, 2016

Test build #55431 has finished for PR 12146 at commit 6c71124.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Apr 9, 2016

That deleteData flag is confusing. If you look at the metastore server's code, you will see that there is another flag called isExternal generated from the table type. For the query

create external table $tabName(c1 int COMMENT 'abc', c2 string)
stored as parquet
location '$tmpDir'
as select 1, '3'

I am not sure I understand what you mean by Although we use the EXTERNAL keyword, the table is still managed_table. The above statement could be wrong. Why it is still a managed_table?

@gatorsmile
Copy link
Member Author

It sounds like Hive ignores and overwrites the type we provided when we attempt to create the table and converts it to Managed. I will do more investigation today. Just added a test case for data source tables, which is EXTERNAL.

You can see I did the table type verification in both test cases.

|stored as parquet
|location '$tmpDir'
|as select 1, '3'
""".stripMargin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an external table, table type needs to be EXTERNAL_TABLE and its table properties should have an entry EXTERNAL set to TRUE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is a bug in HiveClientImpl.toHiveTable. I will fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, for this test case, I am not sure Hive allows you to provide a column list in a CTAS table. Also, I am not sure if Hive allows you to provide EXTERNAL keyword for a table generated by CTAS query. Can you double check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing this with me!

hive> create external table tab1(c1 int, c2 string) stored as parquet as select 1, 3;
FAILED: SemanticException [Error 10065]: CREATE TABLE AS SELECT command cannot specify the list of columns for the target table
hive> create external table tab1 stored as parquet as select 1, 3;
FAILED: SemanticException [Error 10070]: CREATE-TABLE-AS-SELECT cannot create external table

Obviously, Spark SQL is different from Hive. Should we change the behavior to match Hive SQL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL in the test case is processed by Spark SQL. It is not a native Hive SQL command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have been supporting it for a long time (maybe by accident). I have created https://issues.apache.org/jira/browse/SPARK-14507 for deciding if we should keep it or drop the support of it.

For this test, can we change to a real managed table (drop the external keyword from the statement)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it. Thanks!

BTW, it does not work even if without the external keyword. I will also remove the column list.

hive> create table tab11(c1 int, c2 string) stored as parquet as select 1, 3;
FAILED: SemanticException [Error 10065]: CREATE TABLE AS SELECT command cannot specify the list of columns for the target table

@SparkQA
Copy link

SparkQA commented Apr 9, 2016

Test build #55432 has finished for PR 12146 at commit 2f45c0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Apr 9, 2016
…g Parsing

#### What changes were proposed in this pull request?
"Not good to slightly ignore all the un-supported options/clauses. We should either support it or throw an exception." A comment from yhuai in another PR #12146

- Can `Explain` be an exception? The `Formatted` clause is used in `HiveCompatibilitySuite`.
- Two unsupported clauses in `Drop Table` are handled in a separate PR: #12146

#### How was this patch tested?
Test cases are added to verify all the cases.

Author: gatorsmile <[email protected]>

Closes #12255 from gatorsmile/warningToException.
catalog.dropTable(TableIdentifier("unknown_table", Some("db2")), ignoreIfNotExists = false)
}
// If the table does not exist, we do not issue an exception. Instead, we output an error
// message to console when ignoreIfNotExists is set to false. This is to make it consistent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we only have a log message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is not completely consistent. let me rewrite it. Thanks!

@gatorsmile
Copy link
Member Author

Sure, will do it soon. Really thank you for your review!

@yhuai
Copy link
Contributor

yhuai commented Apr 9, 2016

Also update the description if you get a chance?

@gatorsmile
Copy link
Member Author

Sure, will do it. Thanks!

@SparkQA
Copy link

SparkQA commented Apr 10, 2016

Test build #55451 has finished for PR 12146 at commit 40861a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Apr 10, 2016

LGTM. We can remove isViewSupported in a follow-up.

@asfgit asfgit closed this in dfce966 Apr 10, 2016
@gatorsmile gatorsmile deleted the dropView branch April 10, 2016 14:43
asfgit pushed a commit that referenced this pull request Apr 11, 2016
…iew and Drop Table

#### What changes were proposed in this pull request?
This PR is to address the comment: #12146 (diff). It removes the function `isViewSupported` from `SessionCatalog`. After the removal, we still can capture the user errors if users try to drop a table using `DROP VIEW`.

#### How was this patch tested?
Modified the existing test cases

Author: gatorsmile <[email protected]>

Closes #12284 from gatorsmile/followupDropTable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants