-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25301][SQL] When a view uses an UDF from a non default database, Spark analyser throws AnalysisException #22307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| val functionClass = | ||
| classOf[org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper].getCanonicalName | ||
| sql(s"CREATE FUNCTION $db.$functionNameUpper AS '$functionClass'") | ||
| val ds = sql(s"SELECT `$db.$functionNameUpper`(`$table`.`c1`) FROM `$db`.`$table`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the expected behaivour of backquotes; ANTLR parses inputs like;
- `testdb.f1` =>
funcName: f1dbName: testdb - testdb.f1 =>
funcName: testdb.f1dbName: default
|
Test build #95573 has finished for PR 22307 at commit
|
|
@vinodkc, do you have the JAR for |
|
The problem here looks some inconsistency between Hive and Spark - since Spark claims Hive compatibility, looks we should either explain the difference or fix it. |
|
@HyukjinKwon , even with this |
|
The problem here sounds: is used to build the view which is ran again by SparkSQL parser. The Hive API avove returns: The root cause is that the code above If Hive's behaviour is correct, we need a strong justification to fix Spark's behaviour, or document the differences. If not, Hive should fix this. |
| ctx.identifier().asScala.map(_.getText) match { | ||
| case Seq(db, fn) => FunctionIdentifier(fn, Option(db)) | ||
| case Seq(fn) => FunctionIdentifier(fn, None) | ||
| case Seq(fn) => fn.split('.').toSeq match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, at the very least, breaks users app ... even if this is a correct behaviour within Hive, we should target this 3.0.0.
|
@vinod, see the discussion made in #18142. Shall we close this? cc @cloud-fan as well. |
|
@HyukjinKwon , I'll close this PR |
What changes were proposed in this pull request?
When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException
Steps to simulate this issue
Step 1: Run following statements in Hive
Step2 : Run following statement in Spark-shell
spark.sql("SELECT * FROM d100.v100").showthrows
This is because, while parsing the SQL statement of the View
'select `d100.udf100`(`emp`.`name`) from `default`.`emp`', spark parser fails to split database name and udf name and hence Spark function registry tries to load the UDF 'd100.udf100' from 'default' database.To solve this issue, before creating 'FunctionIdentifier' , try to get actual database name and then create FunctionIdentifier using that database name and function name
How was this patch tested?
Added 1 unit test