[SPARK-14124] [SQL] Implement Database-related DDL Commands #12009

gatorsmile · 2016-03-28T19:42:08Z

What changes were proposed in this pull request?

This PR is to implement the following four Database-related DDL commands:

CREATE DATABASE|SCHEMA [IF NOT EXISTS] database_name
DROP DATABASE [IF EXISTS] database_name [RESTRICT|CASCADE]
DESCRIBE DATABASE [EXTENDED] db_name
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...)

Another PR will be submitted to handle the unsupported commands. In the Database-related DDL commands, we will issue an error exception for ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role.

cc @yhuai @andrewor14 @rxin Could you review the changes? Is it in the right direction? Thanks!

How was this patch tested?

Added a few test cases in command/DDLSuite.scala for testing DDL command execution in SQLContext. Since HiveContext also shares the same implementation, the existing test cases in \hive also verifies the correctness of these commands.

rxin · 2016-03-28T20:02:40Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala

+  // todo: what is the default path in SessionCatalog?
+  def getDefaultPath: String = ""
+
+  def getDefaultDBExtension: String = ".db"


what is this?

That is from Hive. http://www.reedbushey.com/99Programming%20Hive.pdf

The database directory is created under a top-level directory specified by the property hive.metastore.warehouse.dir, which we discussed in “Local Mode Configuration” on page 24 and “Distributed and Pseudodistributed Mode Configura- tion” on page 26. Assuming you are using the default value for this property, /user/hive/ warehouse, when the financials database is created, Hive will create the directory /user/ hive/warehouse/financials.db. Note the .db extension.

I am not sure if we need to follow it. Let me know if we need to add an external configuration parameter? Or we do not need it? Thanks!

either way I don't think we want this; just embed it in getDefaultDBPath. Otherwise we'll end up with too many random methods

SparkQA · 2016-03-28T20:10:53Z

Test build #54348 has finished for PR 12009 at commit f4c33e2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-03-28T21:44:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

+    }
+  }
+
+  test("Create/Drop/Alter/Describe Database - basic") {


can you split these into separate tests?

Sure, will do it. Thanks!

andrewor14 · 2016-03-28T22:14:27Z

Looks great.

andrewor14 · 2016-03-29T01:08:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala

+ * {{{
+ *    CREATE DATABASE|SCHEMA [IF NOT EXISTS] database_name
+ * }}}
+ */


also, why did you move them to this file? These are DDLs so they belong to ddl.scala

Agreed. We should keep them in ddl.scala.

Sure, will move them back. Thanks!

rxin · 2016-03-29T03:23:03Z

I just merged #12015

Can you update to use the new ANTLR4 parser instead? We are going to remove the ANTLR3 one in the next day or two. Thanks.

gatorsmile · 2016-03-29T06:52:34Z

Sure, will do the changes. Thanks!

# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala

SparkQA · 2016-03-29T23:49:16Z

Test build #54461 has finished for PR 12009 at commit 16c829e.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
- case class DescribeDatabase(

andrewor14 · 2016-03-30T00:35:37Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala


      // CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT database_comment]
      // [LOCATION path] [WITH DBPROPERTIES (key1=val1, key2=val2, ...)];
-      case Token("TOK_CREATEDATABASE", Token(databaseName, Nil) :: args) =>


changes in this file aren't really necessary since the file will be deleted anyway, but for now it's OK to keep

SparkQA · 2016-03-30T00:36:31Z

Test build #54468 has finished for PR 12009 at commit f22ef90.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-03-30T00:37:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+  override def run(sqlContext: SQLContext): Seq[Row] = {
+    val dbMetadata: CatalogDatabase = sqlContext.sessionState.catalog.getDatabase(databaseName)
+    val result =
+      Row("Database Name", dbMetadata.name) ::


this can probably just be Name. The user ran DESCRIBE DATABASE so it's pretty obvious

andrewor14 · 2016-03-30T00:37:35Z

LGTM will merge this once tests pass.

andrewor14 · 2016-03-30T00:39:11Z

Oh good timing. Merging into master. Thanks for your work @gatorsmile!

gatorsmile · 2016-03-30T00:57:18Z

Thank you for your reviews!

yhuai · 2016-03-30T23:19:14Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+        props),
+      ifNotExists)
+    Seq.empty[Row]
+  }


Do we need to create the underlying dir in this command?

True. Only did it in the default DB path. I forgot to do it in the regular case : ( Let me submit a follow-up PR for it. Thanks for catching it!

@yhuai I tried it in spark-sql. If the directory is not created, Hive will do it for us. I am wondering if we still should create directory in Spark?

However, this PR has an issue when users specify the location in the Create Database command. The generated path should be path/databaseName.db instead of path. Will fix it soon.

Yea. Let's create the directory if it is not created.

I see. Will do it in #12081. Thanks!

@yhuai I did try it. Actually, the code is done... However, if we create a directory before issuing Hive client API createDatabase, we will get the following error message from Hive:

Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database db3 already exists;

Just feel free to let me know what I should do next. Thanks!

gatorsmile and others added 30 commits November 13, 2015 14:50

Merge remote-tracking branch 'upstream/master'

01e4cdf

Merge remote-tracking branch 'upstream/master'

6835704

Merge remote-tracking branch 'upstream/master'

9180687

SPARK-11633

b38a21e

Merge remote-tracking branch 'upstream/master' into joinMakeCopy

d2b84af

Merge remote-tracking branch 'upstream/master'

fda8025

Merge branch 'master' of https://github.com/gatorsmile/spark

ac0dccd

Merge remote-tracking branch 'upstream/master'

6e0018b

converge

0546772

converge

b37a64f

Merge remote-tracking branch 'upstream/master'

c2a872c

Merge remote-tracking branch 'upstream/master'

ab6dbd7

Merge remote-tracking branch 'upstream/master'

4276356

Merge remote-tracking branch 'upstream/master'

2dab708

Merge remote-tracking branch 'upstream/master'

0458770

Merge remote-tracking branch 'upstream/master'

1debdfa

Merge remote-tracking branch 'upstream/master'

763706d

Merge remote-tracking branch 'upstream/master'

4de6ec1

Merge remote-tracking branch 'upstream/master'

9422a4f

Merge remote-tracking branch 'upstream/master'

52bdf48

Merge remote-tracking branch 'upstream/master'

1e95df3

Merge remote-tracking branch 'upstream/master'

fab24cf

Merge remote-tracking branch 'upstream/master'

8b2e33b

Merge remote-tracking branch 'upstream/master'

2ee1876

Merge remote-tracking branch 'upstream/master'

b9f0090

Merge remote-tracking branch 'upstream/master'

ade6f7e

Merge remote-tracking branch 'upstream/master'

9fd63d2

Merge remote-tracking branch 'upstream/master'

5199d49

Merge remote-tracking branch 'upstream/master'

404214c

Merge remote-tracking branch 'upstream/master'

c001dd9

gatorsmile added 6 commits March 28, 2016 09:52

native support for drop/alter/describe database

aba3a95

add more test cases.

ffe0c7a

added an extra line.

d7c3648

added more comments.

9420f08

update the comment.

4dab82c

update the comment.

f4c33e2

rxin reviewed Mar 28, 2016
View reviewed changes

andrewor14 reviewed Mar 28, 2016
View reviewed changes

andrewor14 mentioned this pull request Mar 28, 2016

[SPARK-14184][SQL] Support native execution of SHOW DATABASE command and fix SHOW TABLE to use table identifier pattern #11991

Closed

andrewor14 reviewed Mar 29, 2016
View reviewed changes

gatorsmile added 4 commits March 29, 2016 07:55

address comments.

536cf36

address comments.

16c829e

Merge remote-tracking branch 'upstream/master'

07afea5

Merge branch 'dbDDL' into dbDDLnew

f22ef90

# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala

andrewor14 reviewed Mar 30, 2016
View reviewed changes

asfgit closed this in b66b97c Mar 30, 2016

yhuai reviewed Mar 30, 2016
View reviewed changes

[SPARK-14124] [SQL] Implement Database-related DDL Commands #12009

[SPARK-14124] [SQL] Implement Database-related DDL Commands #12009

Uh oh!

Conversation

gatorsmile commented Mar 28, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 28, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Mar 28, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin commented Mar 29, 2016

Uh oh!

gatorsmile commented Mar 29, 2016

Uh oh!

SparkQA commented Mar 29, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Mar 30, 2016

Uh oh!

andrewor14 commented Mar 30, 2016

Uh oh!

gatorsmile commented Mar 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants