-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-17338][SQL] add global temp view #14897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @cloud-fan .
Then, is there no way to drop a global temporary view when there exists a local temporary view?
e.g. If I created the followings,
- default.view1 (permanent view in
defaultschema) - view1 (local temporary view).
And, suddenly, a global view is created by another session.
3. view1 (global temporary view)
In my session, is there any way to drop 3 (global view view1) without dropping 2 (my local view 2)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add DROP GLOBAL VIEW like CREATE GLOBAL TEMPORARY VIEW?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see document here: https://github.com/apache/spark/pull/14897/files#diff-cfec39cf8accabd227bd325f0a0a5f30R49
yea, there is no way to drop a global temp view if a temp view with same name exists. We can have some discussion about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Right. Up to now, all happens in the same session. So, temporary view is created after permanent view always. It works like STACK.
However, this is a global view made by other session. So, it does not work like STACK. Even in the same session, if we execute 'create temporary view' and 'create global view', we must delete the local temporary view first. So, it seems more serious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is a design choice, never mind.
I think it's okay if this PR can provide DROP GLOBAL VIEW together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a bit risky to provide user the ability to drop a global view implicitly.
I think all global view modification commands should be done explicitly, including create, drop, alter..
|
Test build #64719 has finished for PR 14897 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the TempViewManager used by local views?
|
Maybe we can create a few test cases for verifying the behaviors of global temporary views. Below is an example. We did it for spark/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala Lines 1082 to 1112 in a117afa
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want separate command for alter global view.
It is not safe for a user to be able to alter a global view directly.
|
How about another API spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala Lines 491 to 498 in f7c9ff5
|
|
We might need an update on all the comments regarding the local temporary views in A few external DDL/APIs might need an update too. For example, More test cases are required for each external change, especially the resolution ordering and accessibility issues. |
|
When comparing between permanent views and the global temporary views, I am trying to list the major differences:
Is my understanding right? |
|
@srinathshankar also want to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of this patch, but why is it ok for an alter view to work on a table ?
2b97fba to
df6f5e5
Compare
|
Test build #65662 has finished for PR 14897 at commit
|
df6f5e5 to
b532e98
Compare
|
Test build #65665 has finished for PR 14897 at commit
|
b532e98 to
c360dbc
Compare
|
Test build #65668 has finished for PR 14897 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid renaming createLocalTempView?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not needing LOCAL key word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this too restrictive? If the user ask directly for GLOBAL_TEMP_VIEW_DATABASE, may be we should allow it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the name is very specific "_global_temp", it is unlikly that it may conflict with user table's name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change the name to "_global_temporary_views"
c360dbc to
2f05b08
Compare
|
Test build #65717 has finished for PR 14897 at commit
|
| Traceback (most recent call last): | ||
| ... | ||
| AnalysisException: ... | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this bad case will end up in the python doc. Can we move this test to the tests.py file (it is fine to do it in a follow-up pr)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the method doc in dropTempView, is it bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be pretty confusing because this test case will appear in the python doc (and users will see an example that throws exceptions).
| true | ||
| } else { | ||
| false | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is reason that failing to rename has two behavior (when source does not exist, we return false. But, when destination already exists, we thrown an error)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour comes from the previous temp view: If the source doesn't exist, try persisted table/view, if source exists but destination already exists, throw an error.
| if (dbName == globalTempViewManager.database) { | ||
| throw new AnalysisException( | ||
| s"${globalTempViewManager.database} is a system preserved database, " + | ||
| "you cannot use it as current database.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it will be useful to let users know how to access temp views under this namespace.
| if (db == globalTempViewManager.database) { | ||
| globalTempViewManager.get(table).map { viewDef => | ||
| SubqueryAlias(relationAlias, viewDef, Some(name)) | ||
| }.getOrElse(throw new NoSuchTableException(db, table)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the else branch, we are just doing SubqueryAlias(relationAlias, tempTables(table), Option(name)). Later we should also make this branch and that branch consistent (by throwing NoSuchTableException).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will never hit NoSuchTableException in the else branch, because we have !tempTables.contains(table) in the else if branch.
I don't like this style and prefer the new atomic way which is used for the global temp view, so that we don't need to wrap the whole method in synchronized block. But I wanna minimal the changes in this PR and leave this code unchanged. We can improve it in follow-ups.
| */ | ||
| @throws[AnalysisException] | ||
| def createGlobalTempView(viewName: String): Unit = withPlan { | ||
| createTempViewCommand(viewName, replace = false, global = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behavior of createTempView("a.b")?
|
|
||
| { | ||
| // Set the Hive metastore warehouse path to the one we use | ||
| val tempConf = new SQLConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this block does not use any val defined in this class, it is fine to move it.
|
|
||
| { | ||
| // Set the Hive metastore warehouse path to the one we use | ||
| val tempConf = new SQLConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double check?
| checkAnswer(spark.table(s"$globalTempDB.src"), Row(1, "a")) | ||
|
|
||
| // Use qualified name to rename a global temp view. | ||
| sql(s"ALTER VIEW $globalTempDB.src RENAME TO src2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do sql(s"ALTER VIEW $globalTempDB.src RENAME TO $globalTempDB.src2") since we always require operating global temp views using qualified identifiers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ALTER VIEW/TABLE ... RENAME TO ..., we forbid users to specify database for the destination table name, because:
- If the table/view belongs to a database, then it doesn't make sense to specify database for destination table name, because we can't change the database of the table/view by
RENAME TO - if the table/view has no database(local temp view), then we can't specify database for destination table name.
Global temp view also follows these rules so we should not allow ALTER VIEW $globalTempDB.src RENAME TO $globalTempDB.src2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. Thanks for the explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed?
| val globalTempDB = sparkContext.conf.get(GLOBAL_TEMP_DATABASE).toLowerCase | ||
| if (externalCatalog.databaseExists(globalTempDB)) { | ||
| throw new SparkException( | ||
| s"$globalTempDB is a system preserved database, please rename your existing database " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also mention this conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the offline discussion, we decided to hide this internal config to users, did I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh no. I think it is fine to hide that conf. But, if the user hit this exception, seems it is better to let them know there is another workaround (renaming the existing db may not be easy).
aff5091 to
fb96f1c
Compare
|
Test build #66602 has finished for PR 14897 at commit
|
|
Test build #66603 has finished for PR 14897 at commit
|
|
LGTM. Let's make a small change according to #14897 (comment) and we can merge this pr. Thanks! |
|
Test build #66620 has finished for PR 14897 at commit
|
|
thanks for the review, merging to master! |
| * @param viewName the name of the view to be dropped. | ||
| * @since 2.1.0 | ||
| */ | ||
| def dropGlobalTempView(viewName: String): Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to explain the return type
| * @param viewName the name of the view to be dropped. | ||
| * @since 2.0.0 | ||
| */ | ||
| def dropTempView(viewName: String): Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this returning unit when the global one return boolean? we should make them conssitent, i.e. this one should return boolean too.
also make sure we document the return value.
| * preserved database `_global_temp`, and we must use the qualified name to refer a global temp | ||
| * view, e.g. `SELECT * FROM _global_temp.view1`. | ||
| * | ||
| * @throws TempTableAlreadyExistsException if the view name already exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to AnalysisException
TempTableAlreadyExistsException is an internal exception
| viewName: String, | ||
| replace: Boolean, | ||
| global: Boolean): CreateViewCommand = { | ||
| val viewType = if (global) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shorten this to
val viewType = if (global) GlobalTempView else LocalTempView
|
@cloud-fan this looks good. I took a look at the API changes. Can you address the comments in a follow-up PR? |
## What changes were proposed in this pull request? address post hoc review comments for #14897 ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes #15424 from cloud-fan/global-temp-view.
## What changes were proposed in this pull request? SQLConf is session-scoped and mutable. However, we do have the requirement for a static SQL conf, which is global and immutable, e.g. the `schemaStringThreshold` in `HiveExternalCatalog`, the flag to enable/disable hive support, the global temp view database in #14897. Actually we've already implemented static SQL conf implicitly via `SparkConf`, this PR just make it explicit and expose it to users, so that they can see the config value via SQL command or `SparkSession.conf`, and forbid users to set/unset static SQL conf. ## How was this patch tested? new tests in SQLConfSuite Author: Wenchen Fan <[email protected]> Closes #15295 from cloud-fan/global-conf.
## What changes were proposed in this pull request? Global temporary view is a cross-session temporary view, which means it's shared among all sessions. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database `global_temp`(configurable via SparkConf), and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1. changes for `SessionCatalog`: 1. add a new field `gloabalTempViews: GlobalTempViewManager`, to access the shared global temp views, and the global temp db name. 2. `createDatabase` will fail if users wanna create `global_temp`, which is system preserved. 3. `setCurrentDatabase` will fail if users wanna set `global_temp`, which is system preserved. 4. add `createGlobalTempView`, which is used in `CreateViewCommand` to create global temp views. 5. add `dropGlobalTempView`, which is used in `CatalogImpl` to drop global temp view. 6. add `alterTempViewDefinition`, which is used in `AlterViewAsCommand` to update the view definition for local/global temp views. 7. `renameTable`/`dropTable`/`isTemporaryTable`/`lookupRelation`/`getTempViewOrPermanentTableMetadata`/`refreshTable` will handle global temp views. changes for SQL commands: 1. `CreateViewCommand`/`AlterViewAsCommand` is updated to support global temp views 2. `ShowTablesCommand` outputs a new column `database`, which is used to distinguish global and local temp views. 3. other commands can also handle global temp views if they call `SessionCatalog` APIs which accepts global temp views, e.g. `DropTableCommand`, `AlterTableRenameCommand`, `ShowColumnsCommand`, etc. changes for other public API 1. add a new method `dropGlobalTempView` in `Catalog` 2. `Catalog.findTable` can find global temp view 3. add a new method `createGlobalTempView` in `Dataset` ## How was this patch tested? new tests in `SQLViewSuite` Author: Wenchen Fan <[email protected]> Closes apache#14897 from cloud-fan/global-temp-view.
## What changes were proposed in this pull request? address post hoc review comments for apache#14897 ## How was this patch tested? N/A Author: Wenchen Fan <[email protected]> Closes apache#15424 from cloud-fan/global-temp-view.
## What changes were proposed in this pull request? SQLConf is session-scoped and mutable. However, we do have the requirement for a static SQL conf, which is global and immutable, e.g. the `schemaStringThreshold` in `HiveExternalCatalog`, the flag to enable/disable hive support, the global temp view database in apache#14897. Actually we've already implemented static SQL conf implicitly via `SparkConf`, this PR just make it explicit and expose it to users, so that they can see the config value via SQL command or `SparkSession.conf`, and forbid users to set/unset static SQL conf. ## How was this patch tested? new tests in SQLConfSuite Author: Wenchen Fan <[email protected]> Closes apache#15295 from cloud-fan/global-conf.
|
Hi there, I am a bit puzzled by the golbal_temp database. Why do we need to provide a qualified name to access the view? We can't even use the "USE global_temp" command. This breaks lots of tools that expect to switch to a db with the USE command. Anyone know if there is a work around that issue? Any plan to write a patch? Thanks |
|
|
|
Hi cloud-fan, |
|
global temp view is designed to be accessed by fully qualified name, for some name conflicting concerns. I'm afraid you have to let your tool know this :) |
|
@cloud-fan thanks for pointing that out. Those closed-source tools might need some time to be updated. Nothing I can do on that front :( |
|
You can fork the project and remove the check, but then you need to check all the places that read current db, and think about what if the current db is |
What changes were proposed in this pull request?
Global temporary view is a cross-session temporary view, which means it's shared among all sessions. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database
global_temp(configurable via SparkConf), and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.changes for
SessionCatalog:gloabalTempViews: GlobalTempViewManager, to access the shared global temp views, and the global temp db name.createDatabasewill fail if users wanna createglobal_temp, which is system preserved.setCurrentDatabasewill fail if users wanna setglobal_temp, which is system preserved.createGlobalTempView, which is used inCreateViewCommandto create global temp views.dropGlobalTempView, which is used inCatalogImplto drop global temp view.alterTempViewDefinition, which is used inAlterViewAsCommandto update the view definition for local/global temp views.renameTable/dropTable/isTemporaryTable/lookupRelation/getTempViewOrPermanentTableMetadata/refreshTablewill handle global temp views.changes for SQL commands:
CreateViewCommand/AlterViewAsCommandis updated to support global temp viewsShowTablesCommandoutputs a new columndatabase, which is used to distinguish global and local temp views.SessionCatalogAPIs which accepts global temp views, e.g.DropTableCommand,AlterTableRenameCommand,ShowColumnsCommand, etc.changes for other public API
dropGlobalTempViewinCatalogCatalog.findTablecan find global temp viewcreateGlobalTempViewinDatasetHow was this patch tested?
new tests in
SQLViewSuite