Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Nov 4, 2016

What changes were proposed in this pull request?

Based on the discussion in SPARK-18209. It doesn't really make sense to create permanent views based on temporary views or temporary UDFs.

To disallow the supports and issue the exceptions, this PR needs to detect whether a temporary view/UDF is being used when defining a permanent view. Basically, this PR can be split to two sub-tasks:

Task 1: detecting a temporary view from the query plan of view definition.
When finding an unresolved temporary view, Analyzer replaces it by a SubqueryAlias with the corresponding logical plan, which is stored in an in-memory HashMap. After replacement, it is impossible to detect whether the SubqueryAlias is added/generated from a temporary view. Thus, to detect the usage of a temporary view in view definition, this PR traverses the unresolved logical plan and uses the name of an UnresolvedRelation to detect whether it is a (global) temporary view.

Task 2: detecting a temporary UDF from the query plan of view definition.
Detecting usage of a temporary UDF in view definition is not straightfoward.

First, in the analyzed plan, we are having different forms to represent the functions. More importantly, some classes (e.g., HiveGenericUDF) are not accessible from CreateViewCommand, which is part of sql/core. Thus, we used the unanalyzed plan child of CreateViewCommand to detect the usage of a temporary UDF. Because the plan has already been successfully analyzed, we can assume the functions have been defined/registered.

Second, in Spark, the functions have four forms: Spark built-in functions, built-in hash functions, permanent UDFs and temporary UDFs. We do not have any direct way to determine whether a function is temporary or not. Thus, we introduced a function isTemporaryFunction in SessionCatalog. This function contains the detailed logics to determine whether a function is temporary or not.

How was this patch tested?

Added test cases.

* Returns whether it is a temporary function.
*/
def isTempFunction(name: FunctionIdentifier): Boolean = {
// copied from HiveSessionCatalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd update HiveSessionCatalog to say don't forget to update this place. Otherwise it will be inconsistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Thanks!

child: LogicalPlan,
view: Option[TableIdentifier])
view: Option[TableIdentifier])(
val isGeneratedByTempTable: java.lang.Boolean = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to do it without introducing this?

Copy link
Member Author

@gatorsmile gatorsmile Nov 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated the PR description. This might be the cleanest way. The reason is explained below.

When finding an unresolved temporary view, Analyzer replaces it by a SubqueryAlias with the corresponding logical plan, which is stored in an in-memory HashMap. After replacement, it is impossible to detect whether the SubqueryAlias is added/generated from a temporary view. Thus, to detect the usage of a temporary view in view definition, we added an extra flag isGeneratedByTempTable into SubqueryAlias. The flag is added into the curried arguments. Via this extra flag, we can easily detect the usage of temporary view from a logical plan traversal.

Also cc @cloud-fan and @liancheng Do you have any better solution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do it from the unanalyzed plan, like what this PR did for temp functions.

case s: UnresolvedRelation
if sparkSession.sessionState.catalog.isTemporaryTable(s.tableIdentifier) =>
throw new AnalysisException(s"Not allowed to create a permanent view $name by " +
s"referencing a temp view ${s.tableIdentifier}. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp -> temporary

case e: UnresolvedFunction
if sparkSession.sessionState.catalog.isTemporaryFunction(e.name) =>
throw new AnalysisException(s"Not allowed to create a permanent view $name by " +
s"referencing a temp function `${e.name}`. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp -> temporary

if sparkSession.sessionState.catalog.isTemporaryTable(s.tableIdentifier) =>
throw new AnalysisException(s"Not allowed to create a permanent view $name by " +
s"referencing a temp view ${s.tableIdentifier}. " +
originalText.map(sql => s"""SQL: "$sql".""").getOrElse(""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to show the entire sql? it's basically what the user just typed in

if sparkSession.sessionState.catalog.isTemporaryFunction(e.name) =>
throw new AnalysisException(s"Not allowed to create a permanent view $name by " +
s"referencing a temp function `${e.name}`. " +
originalText.map(sql => s"""SQL: "$sql".""").getOrElse(""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here do we really need to show the entire sql? it's basically what the user just typed in

}

// When creating a permanent view, not allowed to reference temporary objects.
if (!isTemporary) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i'd move the check into a function, so it is more obvious what's going on with the main workflow.

/**
* Returns whether it is a temporary function.
*/
def isTemporaryFunction(name: FunctionIdentifier): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a unit test for this function?

Copy link
Contributor

@rxin rxin Nov 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also what's the behavior if the function doesn't exist? make sure you test it in the unit test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will resolve this tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like isTemporaryTable, we return false when the function/table does not exist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea please docuemnt it.

// When creating a permanent view, not allowed to reference temporary objects.
if (!isTemporary) {
child.collect {
// Disallow creating permanent views based on temporary views.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd also copy paste what you put in the pr description in here, on why you are traversing the unresolved plan.

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68107 has finished for PR 15764 at commit 695110f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68112 has finished for PR 15764 at commit 7100a8f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
}

test("create a permanent/temp view using a hive function") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while it's good to have separate test cases, i find these three separate cases a bit too verbose. why not just put hive, built-in, and permanent into a single query?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it.

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68116 has finished for PR 15764 at commit 86e7f9d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68143 has finished for PR 15764 at commit a4df82b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 4, 2016

Looks good but I didn't read super carefully.

cc @hvanhovell and @cloud-fan

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68161 has finished for PR 15764 at commit 1c3899f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Returns false when the function is permanent
assert(externalCatalog.listFunctions("db2", "*").toSet == Set("func1"))
assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("func1", Some("db2"))))
assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("db2.func1")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a permanent function right? it's a function called db2.func1 which doesn't exist

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The related codes in lookupFunction looks confusing.

Here, you are right, functionRegistry does not have such a function. Let me remove it. Thanks!

// without a database name, and is neither a built-in function nor a Hive function
name.database.isEmpty &&
functionRegistry.functionExists(name.funcName) &&
!FunctionRegistry.builtin.functionExists(name.funcName) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toLowerCase?

Copy link
Member Author

@gatorsmile gatorsmile Nov 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our built-in function registry is using SimpleFunctionRegistry, which is based on a case insensitive string key hash map.

Thus, no need to add toLowerCase. However, we need to add a test case to ensure we can capture the potential change in this part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Permanent views are not allowed to reference temp objects, including temp function and views
*/
private def verifyTemporaryObjectsNotExists(sparkSession: SparkSession): Unit = {
if (!isTemporary) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible the child is resolved? e.g. by DataFrame API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, we can only create permanent view by SQL API

name.database.isEmpty &&
functionRegistry.functionExists(name.funcName) &&
!FunctionRegistry.builtin.functionExists(name.funcName) &&
!hiveFunctions.contains(name.funcName.toLowerCase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought HiveSessionCatalog is used in context of Hive and SessionCatalog has nothing to do with Hive. So if we remove this from here and override isTemporaryFunction in HiveSessionCatalog, this would look clean. Feels like I am missing something obvious here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true - but we are working towards getting rid of HiveSessionCatalog though (including getting rid of the 3 fallback functions), so in practice this will make no difference soon.

@SparkQA
Copy link

SparkQA commented Nov 6, 2016

Test build #68247 has finished for PR 15764 at commit fec0066.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 8, 2016

Thanks - merging in master/branch-2.1.

asfgit pushed a commit that referenced this pull request Nov 8, 2016
…ry views or UDFs

### What changes were proposed in this pull request?
Based on the discussion in [SPARK-18209](https://issues.apache.org/jira/browse/SPARK-18209). It doesn't really make sense to create permanent views based on temporary views or temporary UDFs.

To disallow the supports and issue the exceptions, this PR needs to detect whether a temporary view/UDF is being used when defining a permanent view. Basically, this PR can be split to two sub-tasks:

**Task 1:** detecting a temporary view from the query plan of view definition.
When finding an unresolved temporary view, Analyzer replaces it by a `SubqueryAlias` with the corresponding logical plan, which is stored in an in-memory HashMap. After replacement, it is impossible to detect whether the `SubqueryAlias` is added/generated from a temporary view. Thus, to detect the usage of a temporary view in view definition, this PR traverses the unresolved logical plan and uses the name of an `UnresolvedRelation` to detect whether it is a (global) temporary view.

**Task 2:** detecting a temporary UDF from the query plan of view definition.
Detecting usage of a temporary UDF in view definition is not straightfoward.

First, in the analyzed plan, we are having different forms to represent the functions. More importantly, some classes (e.g., `HiveGenericUDF`) are not accessible from `CreateViewCommand`, which is part of  `sql/core`. Thus, we used the unanalyzed plan `child` of `CreateViewCommand` to detect the usage of a temporary UDF. Because the plan has already been successfully analyzed, we can assume the functions have been defined/registered.

Second, in Spark, the functions have four forms: Spark built-in functions, built-in hash functions, permanent UDFs and temporary UDFs. We do not have any direct way to determine whether a function is temporary or not. Thus, we introduced a function `isTemporaryFunction` in `SessionCatalog`. This function contains the detailed logics to determine whether a function is temporary or not.

### How was this patch tested?
Added test cases.

Author: gatorsmile <[email protected]>

Closes #15764 from gatorsmile/blockTempFromPermViewCreation.

(cherry picked from commit 1da64e1)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 1da64e1 Nov 8, 2016
ghost pushed a commit to dbtsai/spark that referenced this pull request Nov 29, 2016
…detection after implementing it natively

## What changes were proposed in this pull request?

In apache#15764 we added a mechanism to detect if a function is temporary or not. Hive functions are treated as non-temporary. Of the three hive functions, now "percentile" has been implemented natively, and "hash" has been removed. So we should update the list.

## How was this patch tested?

Unit tests.

Author: Shuai Lin <[email protected]>

Closes apache#16049 from lins05/update-temp-function-detect-hive-list.
asfgit pushed a commit that referenced this pull request Nov 29, 2016
…detection after implementing it natively

## What changes were proposed in this pull request?

In #15764 we added a mechanism to detect if a function is temporary or not. Hive functions are treated as non-temporary. Of the three hive functions, now "percentile" has been implemented natively, and "hash" has been removed. So we should update the list.

## How was this patch tested?

Unit tests.

Author: Shuai Lin <[email protected]>

Closes #16049 from lins05/update-temp-function-detect-hive-list.

(cherry picked from commit e64a204)
Signed-off-by: gatorsmile <[email protected]>
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…detection after implementing it natively

## What changes were proposed in this pull request?

In apache#15764 we added a mechanism to detect if a function is temporary or not. Hive functions are treated as non-temporary. Of the three hive functions, now "percentile" has been implemented natively, and "hash" has been removed. So we should update the list.

## How was this patch tested?

Unit tests.

Author: Shuai Lin <[email protected]>

Closes apache#16049 from lins05/update-temp-function-detect-hive-list.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ry views or UDFs

### What changes were proposed in this pull request?
Based on the discussion in [SPARK-18209](https://issues.apache.org/jira/browse/SPARK-18209). It doesn't really make sense to create permanent views based on temporary views or temporary UDFs.

To disallow the supports and issue the exceptions, this PR needs to detect whether a temporary view/UDF is being used when defining a permanent view. Basically, this PR can be split to two sub-tasks:

**Task 1:** detecting a temporary view from the query plan of view definition.
When finding an unresolved temporary view, Analyzer replaces it by a `SubqueryAlias` with the corresponding logical plan, which is stored in an in-memory HashMap. After replacement, it is impossible to detect whether the `SubqueryAlias` is added/generated from a temporary view. Thus, to detect the usage of a temporary view in view definition, this PR traverses the unresolved logical plan and uses the name of an `UnresolvedRelation` to detect whether it is a (global) temporary view.

**Task 2:** detecting a temporary UDF from the query plan of view definition.
Detecting usage of a temporary UDF in view definition is not straightfoward.

First, in the analyzed plan, we are having different forms to represent the functions. More importantly, some classes (e.g., `HiveGenericUDF`) are not accessible from `CreateViewCommand`, which is part of  `sql/core`. Thus, we used the unanalyzed plan `child` of `CreateViewCommand` to detect the usage of a temporary UDF. Because the plan has already been successfully analyzed, we can assume the functions have been defined/registered.

Second, in Spark, the functions have four forms: Spark built-in functions, built-in hash functions, permanent UDFs and temporary UDFs. We do not have any direct way to determine whether a function is temporary or not. Thus, we introduced a function `isTemporaryFunction` in `SessionCatalog`. This function contains the detailed logics to determine whether a function is temporary or not.

### How was this patch tested?
Added test cases.

Author: gatorsmile <[email protected]>

Closes apache#15764 from gatorsmile/blockTempFromPermViewCreation.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…detection after implementing it natively

## What changes were proposed in this pull request?

In apache#15764 we added a mechanism to detect if a function is temporary or not. Hive functions are treated as non-temporary. Of the three hive functions, now "percentile" has been implemented natively, and "hash" has been removed. So we should update the list.

## How was this patch tested?

Unit tests.

Author: Shuai Lin <[email protected]>

Closes apache#16049 from lins05/update-temp-function-detect-hive-list.
cloud-fan pushed a commit that referenced this pull request Jul 6, 2021
…ry views or UDFs

### What changes were proposed in this pull request?
PR #15764 disabled creating permanent views based on temporary views or UDFs.  But AlterViewCommand didn't block temporary objects.

### Why are the changes needed?
More robust view canonicalization.

### Does this PR introduce _any_ user-facing change?
Yes, now if you alter a permanent view based on temporary views or UDFs, the operation will fail.

### How was this patch tested?
Add new unit tests.

Closes #33204 from jerqi/alter_view.

Authored-by: RoryQi <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Jul 6, 2021
…ry views or UDFs

### What changes were proposed in this pull request?
PR #15764 disabled creating permanent views based on temporary views or UDFs.  But AlterViewCommand didn't block temporary objects.

### Why are the changes needed?
More robust view canonicalization.

### Does this PR introduce _any_ user-facing change?
Yes, now if you alter a permanent view based on temporary views or UDFs, the operation will fail.

### How was this patch tested?
Add new unit tests.

Closes #33204 from jerqi/alter_view.

Authored-by: RoryQi <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e0c6b2e)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants