-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18217] [SQL] Disallow creating permanent views based on temporary views or UDFs #15764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
509327e
1b430bb
695110f
4dbd3b6
7100a8f
86e7f9d
a4df82b
1c3899f
fec0066
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -923,6 +923,24 @@ class SessionCatalog( | |
| } | ||
| } | ||
|
|
||
| /** | ||
| * Returns whether it is a temporary function. If not existed, returns false. | ||
| */ | ||
| def isTemporaryFunction(name: FunctionIdentifier): Boolean = { | ||
| // copied from HiveSessionCatalog | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i'd update HiveSessionCatalog to say don't forget to update this place. Otherwise it will be inconsistent.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do. Thanks! |
||
| val hiveFunctions = Seq( | ||
| "hash", | ||
| "histogram_numeric", | ||
| "percentile") | ||
|
|
||
| // A temporary function is a function that has been registered in functionRegistry | ||
| // without a database name, and is neither a built-in function nor a Hive function | ||
| name.database.isEmpty && | ||
| functionRegistry.functionExists(name.funcName) && | ||
| !FunctionRegistry.builtin.functionExists(name.funcName) && | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our built-in function registry is using Thus, no need to add
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| !hiveFunctions.contains(name.funcName.toLowerCase) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is true - but we are working towards getting rid of HiveSessionCatalog though (including getting rid of the 3 fallback functions), so in practice this will make no difference soon. |
||
| } | ||
|
|
||
| protected def failFunctionLookup(name: String): Nothing = { | ||
| throw new NoSuchFunctionException(db = currentDb, func = name) | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -919,6 +919,34 @@ class SessionCatalogSuite extends SparkFunSuite { | |
| catalog.lookupFunction(FunctionIdentifier("temp1"), arguments) === Literal(arguments.length)) | ||
| } | ||
|
|
||
| test("isTemporaryFunction") { | ||
| val externalCatalog = newBasicCatalog() | ||
| val sessionCatalog = new SessionCatalog(externalCatalog) | ||
|
|
||
| // Returns false when the function does not exist | ||
| assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("temp1"))) | ||
|
|
||
| val tempFunc1 = (e: Seq[Expression]) => e.head | ||
| val info1 = new ExpressionInfo("tempFunc1", "temp1") | ||
| sessionCatalog.createTempFunction("temp1", info1, tempFunc1, ignoreIfExists = false) | ||
|
|
||
| // Returns true when the function is temporary | ||
| assert(sessionCatalog.isTemporaryFunction(FunctionIdentifier("temp1"))) | ||
|
|
||
| // Returns false when the function is permanent | ||
| assert(externalCatalog.listFunctions("db2", "*").toSet == Set("func1")) | ||
| assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("func1", Some("db2")))) | ||
| assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("db2.func1"))) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's not a permanent function right? it's a function called
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The related codes in Here, you are right, |
||
| sessionCatalog.setCurrentDatabase("db2") | ||
| assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("func1"))) | ||
|
|
||
| // Returns false when the function is built-in or hive | ||
| assert(FunctionRegistry.builtin.functionExists("sum")) | ||
| assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("sum"))) | ||
| assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("histogram_numeric"))) | ||
| assert(!sessionCatalog.isTemporaryFunction(FunctionIdentifier("percentile"))) | ||
| } | ||
|
|
||
| test("drop function") { | ||
| val externalCatalog = newBasicCatalog() | ||
| val sessionCatalog = new SessionCatalog(externalCatalog) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,14 +19,14 @@ package org.apache.spark.sql.execution.command | |
|
|
||
| import scala.util.control.NonFatal | ||
|
|
||
| import org.apache.spark.sql.{AnalysisException, Dataset, Row, SparkSession} | ||
| import org.apache.spark.sql.{AnalysisException, Row, SparkSession} | ||
| import org.apache.spark.sql.catalyst.{SQLBuilder, TableIdentifier} | ||
| import org.apache.spark.sql.catalyst.analysis.{UnresolvedFunction, UnresolvedRelation} | ||
| import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType} | ||
| import org.apache.spark.sql.catalyst.expressions.Alias | ||
| import org.apache.spark.sql.catalyst.plans.QueryPlan | ||
| import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project} | ||
| import org.apache.spark.sql.execution.datasources.{DataSource, LogicalRelation} | ||
| import org.apache.spark.sql.types.{MetadataBuilder, StructType} | ||
| import org.apache.spark.sql.types.MetadataBuilder | ||
|
|
||
|
|
||
| /** | ||
|
|
@@ -131,6 +131,10 @@ case class CreateViewCommand( | |
| s"specified by CREATE VIEW (num: `${userSpecifiedColumns.length}`).") | ||
| } | ||
|
|
||
| // When creating a permanent view, not allowed to reference temporary objects. | ||
| // This should be called after `qe.assertAnalyzed()` (i.e., `child` can be resolved) | ||
| verifyTemporaryObjectsNotExists(sparkSession) | ||
|
|
||
| val aliasedPlan = if (userSpecifiedColumns.isEmpty) { | ||
| analyzedPlan | ||
| } else { | ||
|
|
@@ -172,6 +176,34 @@ case class CreateViewCommand( | |
| Seq.empty[Row] | ||
| } | ||
|
|
||
| /** | ||
| * Permanent views are not allowed to reference temp objects, including temp function and views | ||
| */ | ||
| private def verifyTemporaryObjectsNotExists(sparkSession: SparkSession): Unit = { | ||
| if (!isTemporary) { | ||
| // This func traverses the unresolved plan `child`. Below are the reasons: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it possible the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nvm, we can only create permanent view by SQL API |
||
| // 1) Analyzer replaces unresolved temporary views by a SubqueryAlias with the corresponding | ||
| // logical plan. After replacement, it is impossible to detect whether the SubqueryAlias is | ||
| // added/generated from a temporary view. | ||
| // 2) The temp functions are represented by multiple classes. Most are inaccessible from this | ||
| // package (e.g., HiveGenericUDF). | ||
| child.collect { | ||
| // Disallow creating permanent views based on temporary views. | ||
| case s: UnresolvedRelation | ||
| if sparkSession.sessionState.catalog.isTemporaryTable(s.tableIdentifier) => | ||
| throw new AnalysisException(s"Not allowed to create a permanent view $name by " + | ||
| s"referencing a temporary view ${s.tableIdentifier}") | ||
| case other if !other.resolved => other.expressions.flatMap(_.collect { | ||
| // Disallow creating permanent views based on temporary UDFs. | ||
| case e: UnresolvedFunction | ||
| if sparkSession.sessionState.catalog.isTemporaryFunction(e.name) => | ||
| throw new AnalysisException(s"Not allowed to create a permanent view $name by " + | ||
| s"referencing a temporary function `${e.name}`") | ||
| }) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize | ||
| * SQL based on the analyzed plan, and also creates the proper schema for the view. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a unit test for this function?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what's the behavior if the function doesn't exist? make sure you test it in the unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will resolve this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like
isTemporaryTable, we return false when the function/table does not existThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea please docuemnt it.