[SPARK-25301][SQL] When a view uses an UDF from a non default database, Spark analyser throws AnalysisException #22307

vinodkc · 2018-09-01T05:32:42Z

What changes were proposed in this pull request?

When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException

Steps to simulate this issue

Step 1: Run following statements in Hive

CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address;
CREATE DATABASE d100;
CREATE FUNCTION d100.udf100 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; // Note: udf100 is created in d100
CREATE VIEW d100.v100 AS SELECT d100.udf100(name) FROM default.emp; 
SELECT * FROM d100.v100; // query on view d100.v100 gives correct result

Step2 : Run following statement in Spark-shell

spark.sql("SELECT * FROM d100.v100").show

throws

org.apache.spark.sql.AnalysisException: Undefined function: 'd100.udf100'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'

This is because, while parsing the SQL statement of the View
'select `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to split database name and udf name and hence Spark function registry tries to load the UDF 'd100.udf100' from 'default' database.

To solve this issue, before creating 'FunctionIdentifier' , try to get actual database name and then create FunctionIdentifier using that database name and function name

How was this patch tested?

Added 1 unit test

maropu · 2018-09-01T06:41:57Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+          val functionClass =
+            classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName
+          sql(s"CREATE FUNCTION $db.$functionNameUpper AS '$functionClass'")
+          val ds = sql(s"SELECT `$db.$functionNameUpper`(`$table`.`c1`)  FROM `$db`.`$table`")


I think this is the expected behaivour of backquotes; ANTLR parses inputs like;

`testdb.f1` => funcName: f1 dbName: testdb

testdb.f1 => funcName: testdb.f1 dbName: default

SparkQA · 2018-09-01T07:39:07Z

Test build #95573 has finished for PR 22307 at commit 60cc1c9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-09-03T01:29:46Z

@vinodkc, do you have the JAR for /usr/udf/masking.jar? Want to reproduce and check.

HyukjinKwon · 2018-09-03T01:31:24Z

The problem here looks some inconsistency between Hive and Spark - since Spark claims Hive compatibility, looks we should either explain the difference or fix it.

vinodkc · 2018-09-03T04:59:55Z

@HyukjinKwon , even with this
create function d100.udf100 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; we can simulate this issue.
I've updated PR description.

HyukjinKwon · 2018-09-04T07:27:28Z

The problem here sounds:

org.apache.hadoop.hive.ql.metadata.Table.getViewExpandedText()

is used to build the view which is ran again by SparkSQL parser. The Hive API avove returns:

SELECT `d100.udf100`(`emp`.`name`) FROM `default`.`emp`

The root cause is that the code above `d100.udf100` is recognised as a single identifier within Spark side. So, it seeks the function called d100.udf100 within the default database whereas Hive looks seeking the function udf100 under d100.

If Hive's behaviour is correct, we need a strong justification to fix Spark's behaviour, or document the differences. If not, Hive should fix this.

HyukjinKwon · 2018-09-04T07:45:02Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

    ctx.identifier().asScala.map(_.getText) match {
      case Seq(db, fn) => FunctionIdentifier(fn, Option(db))
-      case Seq(fn) => FunctionIdentifier(fn, None)
+      case Seq(fn) => fn.split('.').toSeq match {


This, at the very least, breaks users app ... even if this is a correct behaviour within Hive, we should target this 3.0.0.

HyukjinKwon · 2018-09-06T02:50:48Z

@vinod, see the discussion made in #18142. Shall we close this? cc @cloud-fan as well.

vinodkc · 2018-09-06T06:03:55Z

@HyukjinKwon , I'll close this PR

fix issue with non default udf in hive view

60cc1c9

maropu reviewed Sep 1, 2018

View reviewed changes

HyukjinKwon reviewed Sep 4, 2018

View reviewed changes

vinodkc closed this Sep 6, 2018

vinodkc deleted the br_fix_view_with_udf_issue branch May 25, 2021 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25301][SQL] When a view uses an UDF from a non default database, Spark analyser throws AnalysisException #22307

[SPARK-25301][SQL] When a view uses an UDF from a non default database, Spark analyser throws AnalysisException #22307

Uh oh!

vinodkc commented Sep 1, 2018 •

edited

Loading

Uh oh!

maropu Sep 1, 2018

Uh oh!

SparkQA commented Sep 1, 2018

Uh oh!

HyukjinKwon commented Sep 3, 2018

Uh oh!

HyukjinKwon commented Sep 3, 2018

Uh oh!

vinodkc commented Sep 3, 2018

Uh oh!

HyukjinKwon commented Sep 4, 2018 •

edited

Loading

Uh oh!

HyukjinKwon Sep 4, 2018

Uh oh!

HyukjinKwon commented Sep 6, 2018

Uh oh!

vinodkc commented Sep 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-25301][SQL] When a view uses an UDF from a non default database, Spark analyser throws AnalysisException #22307

[SPARK-25301][SQL] When a view uses an UDF from a non default database, Spark analyser throws AnalysisException #22307

Uh oh!

Conversation

vinodkc commented Sep 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Steps to simulate this issue

Step 1: Run following statements in Hive

Step2 : Run following statement in Spark-shell

How was this patch tested?

Uh oh!

maropu Sep 1, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 1, 2018

Uh oh!

HyukjinKwon commented Sep 3, 2018

Uh oh!

HyukjinKwon commented Sep 3, 2018

Uh oh!

vinodkc commented Sep 3, 2018

Uh oh!

HyukjinKwon commented Sep 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon Sep 4, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Sep 6, 2018

Uh oh!

vinodkc commented Sep 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vinodkc commented Sep 1, 2018 •

edited

Loading

HyukjinKwon commented Sep 4, 2018 •

edited

Loading