Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

This PR is to implement the following four Database-related DDL commands:

  • CREATE DATABASE|SCHEMA [IF NOT EXISTS] database_name
  • DROP DATABASE [IF EXISTS] database_name [RESTRICT|CASCADE]
  • DESCRIBE DATABASE [EXTENDED] db_name
  • ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...)

Another PR will be submitted to handle the unsupported commands. In the Database-related DDL commands, we will issue an error exception for ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role.

cc @yhuai @andrewor14 @rxin Could you review the changes? Is it in the right direction? Thanks!

How was this patch tested?

Added a few test cases in command/DDLSuite.scala for testing DDL command execution in SQLContext. Since HiveContext also shares the same implementation, the existing test cases in \hive also verifies the correctness of these commands.

gatorsmile and others added 30 commits November 13, 2015 14:50
// todo: what is the default path in SessionCatalog?
def getDefaultPath: String = ""

def getDefaultDBExtension: String = ".db"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is from Hive. http://www.reedbushey.com/99Programming%20Hive.pdf

The database directory is created under a top-level directory specified by the property hive.metastore.warehouse.dir, which we discussed in “Local Mode Configuration” on page 24 and “Distributed and Pseudodistributed Mode Configura- tion” on page 26. Assuming you are using the default value for this property, /user/hive/ warehouse, when the financials database is created, Hive will create the directory /user/ hive/warehouse/financials.db. Note the .db extension.

I am not sure if we need to follow it. Let me know if we need to add an external configuration parameter? Or we do not need it? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either way I don't think we want this; just embed it in getDefaultDBPath. Otherwise we'll end up with too many random methods

@SparkQA
Copy link

SparkQA commented Mar 28, 2016

Test build #54348 has finished for PR 12009 at commit f4c33e2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

test("Create/Drop/Alter/Describe Database - basic") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split these into separate tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it. Thanks!

@andrewor14
Copy link
Contributor

Looks great.

* {{{
* CREATE DATABASE|SCHEMA [IF NOT EXISTS] database_name
* }}}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why did you move them to this file? These are DDLs so they belong to ddl.scala

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We should keep them in ddl.scala.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will move them back. Thanks!

@rxin
Copy link
Contributor

rxin commented Mar 29, 2016

I just merged #12015

Can you update to use the new ANTLR4 parser instead? We are going to remove the ANTLR3 one in the next day or two. Thanks.

@gatorsmile
Copy link
Member Author

Sure, will do the changes. Thanks!

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@SparkQA
Copy link

SparkQA commented Mar 29, 2016

Test build #54461 has finished for PR 12009 at commit 16c829e.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class DescribeDatabase(


// CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT database_comment]
// [LOCATION path] [WITH DBPROPERTIES (key1=val1, key2=val2, ...)];
case Token("TOK_CREATEDATABASE", Token(databaseName, Nil) :: args) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes in this file aren't really necessary since the file will be deleted anyway, but for now it's OK to keep

@SparkQA
Copy link

SparkQA commented Mar 30, 2016

Test build #54468 has finished for PR 12009 at commit f22ef90.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def run(sqlContext: SQLContext): Seq[Row] = {
val dbMetadata: CatalogDatabase = sqlContext.sessionState.catalog.getDatabase(databaseName)
val result =
Row("Database Name", dbMetadata.name) ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can probably just be Name. The user ran DESCRIBE DATABASE so it's pretty obvious

@andrewor14
Copy link
Contributor

LGTM will merge this once tests pass.

@andrewor14
Copy link
Contributor

Oh good timing. Merging into master. Thanks for your work @gatorsmile!

@asfgit asfgit closed this in b66b97c Mar 30, 2016
@gatorsmile
Copy link
Member Author

Thank you for your reviews!

props),
ifNotExists)
Seq.empty[Row]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create the underlying dir in this command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Only did it in the default DB path. I forgot to do it in the regular case : ( Let me submit a follow-up PR for it. Thanks for catching it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai I tried it in spark-sql. If the directory is not created, Hive will do it for us. I am wondering if we still should create directory in Spark?

However, this PR has an issue when users specify the location in the Create Database command. The generated path should be path/databaseName.db instead of path. Will fix it soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. Let's create the directory if it is not created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Will do it in #12081. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai I did try it. Actually, the code is done... However, if we create a directory before issuing Hive client API createDatabase, we will get the following error message from Hive:

Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database db3 already exists;

Just feel free to let me know what I should do next. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants