-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14124] [SQL] Implement Database-related DDL Commands #12009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // todo: what is the default path in SessionCatalog? | ||
| def getDefaultPath: String = "" | ||
|
|
||
| def getDefaultDBExtension: String = ".db" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is from Hive. http://www.reedbushey.com/99Programming%20Hive.pdf
The database directory is created under a top-level directory specified by the property hive.metastore.warehouse.dir, which we discussed in “Local Mode Configuration” on page 24 and “Distributed and Pseudodistributed Mode Configura- tion” on page 26. Assuming you are using the default value for this property, /user/hive/ warehouse, when the financials database is created, Hive will create the directory /user/ hive/warehouse/financials.db. Note the .db extension.
I am not sure if we need to follow it. Let me know if we need to add an external configuration parameter? Or we do not need it? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either way I don't think we want this; just embed it in getDefaultDBPath. Otherwise we'll end up with too many random methods
|
Test build #54348 has finished for PR 12009 at commit
|
| } | ||
| } | ||
|
|
||
| test("Create/Drop/Alter/Describe Database - basic") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you split these into separate tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do it. Thanks!
|
Looks great. |
| * {{{ | ||
| * CREATE DATABASE|SCHEMA [IF NOT EXISTS] database_name | ||
| * }}} | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why did you move them to this file? These are DDLs so they belong to ddl.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We should keep them in ddl.scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will move them back. Thanks!
|
I just merged #12015 Can you update to use the new ANTLR4 parser instead? We are going to remove the ANTLR3 one in the next day or two. Thanks. |
|
Sure, will do the changes. Thanks! |
# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
|
Test build #54461 has finished for PR 12009 at commit
|
|
|
||
| // CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT database_comment] | ||
| // [LOCATION path] [WITH DBPROPERTIES (key1=val1, key2=val2, ...)]; | ||
| case Token("TOK_CREATEDATABASE", Token(databaseName, Nil) :: args) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes in this file aren't really necessary since the file will be deleted anyway, but for now it's OK to keep
|
Test build #54468 has finished for PR 12009 at commit
|
| override def run(sqlContext: SQLContext): Seq[Row] = { | ||
| val dbMetadata: CatalogDatabase = sqlContext.sessionState.catalog.getDatabase(databaseName) | ||
| val result = | ||
| Row("Database Name", dbMetadata.name) :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can probably just be Name. The user ran DESCRIBE DATABASE so it's pretty obvious
|
LGTM will merge this once tests pass. |
|
Oh good timing. Merging into master. Thanks for your work @gatorsmile! |
|
Thank you for your reviews! |
| props), | ||
| ifNotExists) | ||
| Seq.empty[Row] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to create the underlying dir in this command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Only did it in the default DB path. I forgot to do it in the regular case : ( Let me submit a follow-up PR for it. Thanks for catching it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai I tried it in spark-sql. If the directory is not created, Hive will do it for us. I am wondering if we still should create directory in Spark?
However, this PR has an issue when users specify the location in the Create Database command. The generated path should be path/databaseName.db instead of path. Will fix it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea. Let's create the directory if it is not created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Will do it in #12081. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai I did try it. Actually, the code is done... However, if we create a directory before issuing Hive client API createDatabase, we will get the following error message from Hive:
Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database db3 already exists;
Just feel free to let me know what I should do next. Thanks!
What changes were proposed in this pull request?
This PR is to implement the following four Database-related DDL commands:
CREATE DATABASE|SCHEMA [IF NOT EXISTS] database_nameDROP DATABASE [IF EXISTS] database_name [RESTRICT|CASCADE]DESCRIBE DATABASE [EXTENDED] db_nameALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...)Another PR will be submitted to handle the unsupported commands. In the Database-related DDL commands, we will issue an error exception for
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role.cc @yhuai @andrewor14 @rxin Could you review the changes? Is it in the right direction? Thanks!
How was this patch tested?
Added a few test cases in
command/DDLSuite.scalafor testing DDL command execution inSQLContext. SinceHiveContextalso shares the same implementation, the existing test cases in\hivealso verifies the correctness of these commands.