Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Aug 31, 2016

What changes were proposed in this pull request?

Global temporary view is a cross-session temporary view, which means it's shared among all sessions. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database global_temp(configurable via SparkConf), and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.

changes for SessionCatalog:

  1. add a new field gloabalTempViews: GlobalTempViewManager, to access the shared global temp views, and the global temp db name.
  2. createDatabase will fail if users wanna create global_temp, which is system preserved.
  3. setCurrentDatabase will fail if users wanna set global_temp, which is system preserved.
  4. add createGlobalTempView, which is used in CreateViewCommand to create global temp views.
  5. add dropGlobalTempView, which is used in CatalogImpl to drop global temp view.
  6. add alterTempViewDefinition, which is used in AlterViewAsCommand to update the view definition for local/global temp views.
  7. renameTable/dropTable/isTemporaryTable/lookupRelation/getTempViewOrPermanentTableMetadata/refreshTable will handle global temp views.

changes for SQL commands:

  1. CreateViewCommand/AlterViewAsCommand is updated to support global temp views
  2. ShowTablesCommand outputs a new column database, which is used to distinguish global and local temp views.
  3. other commands can also handle global temp views if they call SessionCatalog APIs which accepts global temp views, e.g. DropTableCommand, AlterTableRenameCommand, ShowColumnsCommand, etc.

changes for other public API

  1. add a new method dropGlobalTempView in Catalog
  2. Catalog.findTable can find global temp view
  3. add a new method createGlobalTempView in Dataset

How was this patch tested?

new tests in SQLViewSuite

@cloud-fan
Copy link
Contributor Author

cc @yhuai @liancheng @clockfly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @cloud-fan .

Then, is there no way to drop a global temporary view when there exists a local temporary view?

e.g. If I created the followings,

  1. default.view1 (permanent view in default schema)
  2. view1 (local temporary view).

And, suddenly, a global view is created by another session.
3. view1 (global temporary view)

In my session, is there any way to drop 3 (global view view1) without dropping 2 (my local view 2)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add DROP GLOBAL VIEW like CREATE GLOBAL TEMPORARY VIEW?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see document here: https://github.com/apache/spark/pull/14897/files#diff-cfec39cf8accabd227bd325f0a0a5f30R49

yea, there is no way to drop a global temp view if a temp view with same name exists. We can have some discussion about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Right. Up to now, all happens in the same session. So, temporary view is created after permanent view always. It works like STACK.

However, this is a global view made by other session. So, it does not work like STACK. Even in the same session, if we execute 'create temporary view' and 'create global view', we must delete the local temporary view first. So, it seems more serious to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is a design choice, never mind.
I think it's okay if this PR can provide DROP GLOBAL VIEW together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bit risky to provide user the ability to drop a global view implicitly.

I think all global view modification commands should be done explicitly, including create, drop, alter..

@SparkQA
Copy link

SparkQA commented Aug 31, 2016

Test build #64719 has finished for PR 14897 at commit 2b97fba.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class TempViewManager
    • case class DropViewCommand(name: TableIdentifier, ifExists: Boolean) extends RunnableCommand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the TempViewManager used by local views?

@gatorsmile
Copy link
Member

Maybe we can create a few test cases for verifying the behaviors of global temporary views. Below is an example. We did it for current_database

test("current_database with multiple sessions") {
sql("create database a")
sql("use a")
val s2 = newSession()
s2.sql("create database b")
s2.sql("use b")
assert(sql("select current_database()").first() === Row("a"))
assert(s2.sql("select current_database()").first() === Row("b"))
try {
sql("create table test_a(key INT, value STRING)")
s2.sql("create table test_b(key INT, value STRING)")
sql("select * from test_a")
intercept[AnalysisException] {
sql("select * from test_b")
}
sql("select * from b.test_b")
s2.sql("select * from test_b")
intercept[AnalysisException] {
s2.sql("select * from test_a")
}
s2.sql("select * from a.test_a")
} finally {
sql("DROP TABLE IF EXISTS test_a")
s2.sql("DROP TABLE IF EXISTS test_b")
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want separate command for alter global view.

It is not safe for a user to be able to alter a global view directly.

@gatorsmile
Copy link
Member

gatorsmile commented Sep 1, 2016

How about another API refreshTable for global temporary views? We allow users to refresh it for local temporary views.

def refreshTable(name: TableIdentifier): Unit = {
// Go through temporary tables and invalidate them.
// If the database is defined, this is definitely not a temp table.
// If the database is not defined, there is a good chance this is a temp table.
if (name.database.isEmpty) {
tempTables.get(formatTableName(name.table)).foreach(_.refresh())
}
}

@gatorsmile
Copy link
Member

gatorsmile commented Sep 1, 2016

We might need an update on all the comments regarding the local temporary views in SessionCatalog. You can get them by searching the usage of tempTables.

A few external DDL/APIs might need an update too. For example, SHOW TABLES.

More test cases are required for each external change, especially the resolution ordering and accessibility issues.

@gatorsmile
Copy link
Member

When comparing between permanent views and the global temporary views, I am trying to list the major differences:

  • the scope of permanent views is cross-application; the scope of global temp views is cross-session but only accessible in the same application.
  • the definition of permanent views is permanently recorded in the external catalog; the definition of global temp views is lost after the application stops.

Is my understanding right?

@yhuai
Copy link
Contributor

yhuai commented Sep 1, 2016

@srinathshankar also want to review this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this patch, but why is it ok for an alter view to work on a table ?

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65662 has finished for PR 14897 at commit df6f5e5.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GlobalTempViewManager

@cloud-fan cloud-fan changed the title [SPARK-17338][SQL] add global temp view [SPARK-17338][SQL][WIP] add global temp view Sep 20, 2016
@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65665 has finished for PR 14897 at commit b532e98.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GlobalTempViewManager

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65668 has finished for PR 14897 at commit c360dbc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GlobalTempViewManager

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid renaming createLocalTempView?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not needing LOCAL key word?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this too restrictive? If the user ask directly for GLOBAL_TEMP_VIEW_DATABASE, may be we should allow it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the name is very specific "_global_temp", it is unlikly that it may conflict with user table's name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the name to "_global_temporary_views"

@cloud-fan cloud-fan changed the title [SPARK-17338][SQL][WIP] add global temp view [SPARK-17338][SQL] add global temp view Sep 21, 2016
@SparkQA
Copy link

SparkQA commented Sep 21, 2016

Test build #65717 has finished for PR 14897 at commit 2f05b08.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GlobalTempViewManager
    • case class CreateTempViewUsing(

Traceback (most recent call last):
...
AnalysisException: ...
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bad case will end up in the python doc. Can we move this test to the tests.py file (it is fine to do it in a follow-up pr)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the method doc in dropTempView, is it bad?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be pretty confusing because this test case will appear in the python doc (and users will see an example that throws exceptions).

true
} else {
false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is reason that failing to rename has two behavior (when source does not exist, we return false. But, when destination already exists, we thrown an error)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour comes from the previous temp view: If the source doesn't exist, try persisted table/view, if source exists but destination already exists, throw an error.

if (dbName == globalTempViewManager.database) {
throw new AnalysisException(
s"${globalTempViewManager.database} is a system preserved database, " +
"you cannot use it as current database.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it will be useful to let users know how to access temp views under this namespace.

if (db == globalTempViewManager.database) {
globalTempViewManager.get(table).map { viewDef =>
SubqueryAlias(relationAlias, viewDef, Some(name))
}.getOrElse(throw new NoSuchTableException(db, table))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the else branch, we are just doing SubqueryAlias(relationAlias, tempTables(table), Option(name)). Later we should also make this branch and that branch consistent (by throwing NoSuchTableException).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will never hit NoSuchTableException in the else branch, because we have !tempTables.contains(table) in the else if branch.

I don't like this style and prefer the new atomic way which is used for the global temp view, so that we don't need to wrap the whole method in synchronized block. But I wanna minimal the changes in this PR and leave this code unchanged. We can improve it in follow-ups.

*/
@throws[AnalysisException]
def createGlobalTempView(viewName: String): Unit = withPlan {
createTempViewCommand(viewName, replace = false, global = true)
Copy link
Contributor

@yhuai yhuai Oct 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior of createTempView("a.b")?


{
// Set the Hive metastore warehouse path to the one we use
val tempConf = new SQLConf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this block does not use any val defined in this class, it is fine to move it.


{
// Set the Hive metastore warehouse path to the one we use
val tempConf = new SQLConf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check?

checkAnswer(spark.table(s"$globalTempDB.src"), Row(1, "a"))

// Use qualified name to rename a global temp view.
sql(s"ALTER VIEW $globalTempDB.src RENAME TO src2")
Copy link
Contributor

@yhuai yhuai Oct 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do sql(s"ALTER VIEW $globalTempDB.src RENAME TO $globalTempDB.src2") since we always require operating global temp views using qualified identifiers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ALTER VIEW/TABLE ... RENAME TO ..., we forbid users to specify database for the destination table name, because:

  1. If the table/view belongs to a database, then it doesn't make sense to specify database for destination table name, because we can't change the database of the table/view by RENAME TO
  2. if the table/view has no database(local temp view), then we can't specify database for destination table name.

Global temp view also follows these rules so we should not allow ALTER VIEW $globalTempDB.src RENAME TO $globalTempDB.src2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. Thanks for the explanation!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed?

val globalTempDB = sparkContext.conf.get(GLOBAL_TEMP_DATABASE).toLowerCase
if (externalCatalog.databaseExists(globalTempDB)) {
throw new SparkException(
s"$globalTempDB is a system preserved database, please rename your existing database " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also mention this conf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the offline discussion, we decided to hide this internal config to users, did I miss something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no. I think it is fine to hide that conf. But, if the user hit this exception, seems it is better to let them know there is another workaround (renaming the existing db may not be easy).

@SparkQA
Copy link

SparkQA commented Oct 9, 2016

Test build #66602 has finished for PR 14897 at commit aff5091.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2016

Test build #66603 has finished for PR 14897 at commit fb96f1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Oct 10, 2016

LGTM. Let's make a small change according to #14897 (comment) and we can merge this pr. Thanks!

@SparkQA
Copy link

SparkQA commented Oct 10, 2016

Test build #66620 has finished for PR 14897 at commit 29e292a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

* @param viewName the name of the view to be dropped.
* @since 2.1.0
*/
def dropGlobalTempView(viewName: String): Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to explain the return type

* @param viewName the name of the view to be dropped.
* @since 2.0.0
*/
def dropTempView(viewName: String): Unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this returning unit when the global one return boolean? we should make them conssitent, i.e. this one should return boolean too.

also make sure we document the return value.

* preserved database `_global_temp`, and we must use the qualified name to refer a global temp
* view, e.g. `SELECT * FROM _global_temp.view1`.
*
* @throws TempTableAlreadyExistsException if the view name already exists
Copy link
Contributor

@rxin rxin Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to AnalysisException

TempTableAlreadyExistsException is an internal exception

viewName: String,
replace: Boolean,
global: Boolean): CreateViewCommand = {
val viewType = if (global) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shorten this to

val viewType = if (global) GlobalTempView else LocalTempView

@rxin
Copy link
Contributor

rxin commented Oct 10, 2016

@cloud-fan this looks good. I took a look at the API changes. Can you address the comments in a follow-up PR?

asfgit pushed a commit that referenced this pull request Oct 11, 2016
## What changes were proposed in this pull request?

address post hoc review comments for #14897

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes #15424 from cloud-fan/global-temp-view.
asfgit pushed a commit that referenced this pull request Oct 12, 2016
## What changes were proposed in this pull request?

SQLConf is session-scoped and mutable. However, we do have the requirement for a static SQL conf, which is global and immutable, e.g. the `schemaStringThreshold` in `HiveExternalCatalog`, the flag to enable/disable hive support, the global temp view database in #14897.

Actually we've already implemented static SQL conf implicitly via `SparkConf`, this PR just make it explicit and expose it to users, so that they can see the config value via SQL command or `SparkSession.conf`, and forbid users to set/unset static SQL conf.

## How was this patch tested?

new tests in SQLConfSuite

Author: Wenchen Fan <[email protected]>

Closes #15295 from cloud-fan/global-conf.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Global temporary view is a cross-session temporary view, which means it's shared among all sessions. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database `global_temp`(configurable via SparkConf), and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1.

changes for `SessionCatalog`:

1. add a new field `gloabalTempViews: GlobalTempViewManager`, to access the shared global temp views, and the global temp db name.
2. `createDatabase` will fail if users wanna create `global_temp`, which is system preserved.
3. `setCurrentDatabase` will fail if users wanna set `global_temp`, which is system preserved.
4. add `createGlobalTempView`, which is used in `CreateViewCommand` to create global temp views.
5. add `dropGlobalTempView`, which is used in `CatalogImpl` to drop global temp view.
6. add `alterTempViewDefinition`, which is used in `AlterViewAsCommand` to update the view definition for local/global temp views.
7. `renameTable`/`dropTable`/`isTemporaryTable`/`lookupRelation`/`getTempViewOrPermanentTableMetadata`/`refreshTable` will handle global temp views.

changes for SQL commands:

1. `CreateViewCommand`/`AlterViewAsCommand` is updated to support global temp views
2. `ShowTablesCommand` outputs a new column `database`, which is used to distinguish global and local temp views.
3. other commands can also handle global temp views if they call `SessionCatalog` APIs which accepts global temp views, e.g. `DropTableCommand`, `AlterTableRenameCommand`, `ShowColumnsCommand`, etc.

changes for other public API

1. add a new method `dropGlobalTempView` in `Catalog`
2. `Catalog.findTable` can find global temp view
3. add a new method `createGlobalTempView` in `Dataset`

## How was this patch tested?

new tests in `SQLViewSuite`

Author: Wenchen Fan <[email protected]>

Closes apache#14897 from cloud-fan/global-temp-view.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

address post hoc review comments for apache#14897

## How was this patch tested?

N/A

Author: Wenchen Fan <[email protected]>

Closes apache#15424 from cloud-fan/global-temp-view.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

SQLConf is session-scoped and mutable. However, we do have the requirement for a static SQL conf, which is global and immutable, e.g. the `schemaStringThreshold` in `HiveExternalCatalog`, the flag to enable/disable hive support, the global temp view database in apache#14897.

Actually we've already implemented static SQL conf implicitly via `SparkConf`, this PR just make it explicit and expose it to users, so that they can see the config value via SQL command or `SparkSession.conf`, and forbid users to set/unset static SQL conf.

## How was this patch tested?

new tests in SQLConfSuite

Author: Wenchen Fan <[email protected]>

Closes apache#15295 from cloud-fan/global-conf.
@ChristopheSchmitz
Copy link

Hi there,

I am a bit puzzled by the golbal_temp database. Why do we need to provide a qualified name to access the view? We can't even use the "USE global_temp" command. This breaks lots of tools that expect to switch to a db with the USE command.

Anyone know if there is a work around that issue? Any plan to write a patch?

Thanks
Christophe

@cloud-fan
Copy link
Contributor Author

global_temp is a reserved, virtual database, and we don't include it in SHOW DATABASE, why do you wanna USE global_temp?

@ChristopheSchmitz
Copy link

ChristopheSchmitz commented Mar 7, 2017

Hi cloud-fan,
I am using global_temp because I have some view created with createGlobalTempView() that I would need to access using some tools for which I am configuring the database to use as "global_temp". As "USE global_temp" is failing, the tool fails.
I am using createGlobalTempView as I want to share the view across all sessions, as described in http://spark.apache.org/docs/2.1.0/sql-programming-guide.html#global-temporary-view
Any help would be greatly appreciated :)

@cloud-fan
Copy link
Contributor Author

global temp view is designed to be accessed by fully qualified name, for some name conflicting concerns. I'm afraid you have to let your tool know this :)

@ChristopheSchmitz
Copy link

@cloud-fan thanks for pointing that out. Those closed-source tools might need some time to be updated. Nothing I can do on that front :(
As this fully qualified name constraints is just for preventing conflicts, do you see any issue if I clone the project, and remove the check that prevents from running "USE global_temp". Maybe the modification is more involved than that. What do you think?

@cloud-fan
Copy link
Contributor Author

You can fork the project and remove the check, but then you need to check all the places that read current db, and think about what if the current db is global_temp, which may be a very tedious work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.