Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Sep 6, 2018

What changes were proposed in this pull request?

This took me a while to debug and find out. Looks we better at least leave a debug log that SQL text for a view will be used.

Here's how I got there:

Hive:

CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address;
CREATE DATABASE d100;
CREATE FUNCTION d100.udf100 AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper';
CREATE VIEW testview AS SELECT d100.udf100(name) FROM default.emp;

Spark:

sql("SELECT * FROM testview").show()
scala> sql("SELECT * FROM testview").show()
org.apache.spark.sql.AnalysisException: Undefined function: 'd100.udf100'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7

Under the hood, it actually makes sense since the view is defined as SELECT d100.udf100(name) FROM default.emp; and Hive API:

org.apache.hadoop.hive.ql.metadata.Table.getViewExpandedText()

This returns a wrongly qualified SQL string for the view as below:

SELECT `d100.udf100`(`emp`.`name`) FROM `default`.`emp`

which works fine in Hive but not in Spark.

How was this patch tested?

Manually:

18/09/06 19:32:48 DEBUG HiveSessionCatalog: 'SELECT `d100.udf100`(`emp`.`name`) FROM `default`.`emp`' will be used for the view(testview).

@HyukjinKwon
Copy link
Member Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Sep 6, 2018

Test build #95754 has finished for PR 22351 at commit 207d8df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

BTW, if you don't mind, could you update the PR description?

- This returns a fully qualified SQL string for the view as below:
+ This returns a wrongly qualified SQL string for the view as below:

It's because we can see the inconsistency here.

default.emp -> `default`.`emp`
d100.udf100 -> `d100.udf100`

@HyukjinKwon
Copy link
Member Author

Done, thanks @dongjoon-hyun

@HyukjinKwon
Copy link
Member Author

Merged to master.

@asfgit asfgit closed this in 01c3dfa Sep 8, 2018
@cloud-fan
Copy link
Contributor

I'm surprised Hive changes the view text set by Spark. Is it a problem for views? cc @gatorsmile @jiangxb1987 @hvanhovell

@jiangxb1987
Copy link
Contributor

This is actually read some view created by Hive, so I don't think it shall be a problem with view write side.

@jiangxb1987
Copy link
Contributor

jiangxb1987 commented Sep 10, 2018

Just confirmed if the view is created and retrieved both at Spark side then there will be no exception thrown.

scala> spark.sql("CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address")
18/09/10 13:53:23 WARN HiveMetaStore: Location: file:/Users/xingbojiang/workspace/spark/spark-warehouse/emp specified for non-external table:emp
18/09/10 13:53:25 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("CREATE DATABASE d100")
18/09/10 13:53:31 WARN ObjectStore: Failed to get database d100, returning NoSuchObjectException
res1: org.apache.spark.sql.DataFrame = []

scala> spark.sql("CREATE FUNCTION d100.udf100 AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("CREATE VIEW testview AS SELECT d100.udf100(name) FROM default.emp")
res3: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SELECT * FROM testview").show()
+-----------------+
|d100.udf100(name)|
+-----------------+
|             USER|
+-----------------+

@HyukjinKwon HyukjinKwon deleted the minor-debug branch October 16, 2018 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants