[SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs #28104

huaxingao · 2020-04-02T21:33:12Z

What changes were proposed in this pull request?

Document Spark integration with Hive UDFs/UDAFs/UDTFs

Why are the changes needed?

To make SQL Reference complete

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Manually build and check

huaxingao · 2020-04-02T21:50:37Z

cc @maropu @gatorsmile

SparkQA · 2020-04-02T21:51:46Z

Test build #120738 has finished for PR 28104 at commit 6cf0798.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

docs/sql-ref-functions-udf-hive.md

maropu · 2020-04-03T01:39:39Z

docs/sql-ref-functions-udf-hive.md

+
+<pre><code>
+// Register a Hive UDF and use it in Spark SQL
+// Scala


Probably, we need ADD JAR for the hive UDF below here.

docs/sql-ref-functions-udf-hive.md

maropu · 2020-04-03T01:46:04Z

docs/sql-ref-functions-udf-hive.md

+// GenericUDTFCount2 outputs the number of rows seen, twice.
+// The function source code can be found at:
+// https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF
+sql(s"ADD JAR ${hiveContext.getHiveFile("TestUDTF.jar").getCanonicalPath}")


Since hiveContext.getHiveFile is a method for our test use and users easily cannot undersntad this example, I think we had not better use it for the document.

maropu · 2020-04-03T01:46:35Z

also cc: @viirya @srowen

docs/sql-ref-functions-udf-hive.md

SparkQA · 2020-04-03T06:47:05Z

Test build #120753 has finished for PR 28104 at commit d777990.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-04-03T07:05:01Z

Test build #120752 has finished for PR 28104 at commit 67f9bf7.

This patch fails due to an unknown error code, -9.
This patch does not merge cleanly.
This patch adds no public classes.

maropu · 2020-04-03T07:22:46Z

retest this please

SparkQA · 2020-04-03T07:41:16Z

Test build #120757 has finished for PR 28104 at commit d777990.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-04-06T13:57:48Z

@viirya @maropu OK with you?

maropu · 2020-04-06T14:07:41Z

@huaxingao It seems all the reviews have been not adressed yet, e.g., https://github.com/apache/spark/pull/28104/files#r402692479

…AFs/UDTFs

SparkQA · 2020-04-06T21:40:26Z

Test build #120887 has finished for PR 28104 at commit 52269f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-04-07T00:30:25Z

Test build #120886 has finished for PR 28104 at commit 2638e01.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

maropu · 2020-04-07T04:05:08Z

@huaxingao I brushed up the doc based on your PR. Could you check this? huaxingao#2

Fix

SparkQA · 2020-04-09T06:02:35Z

Test build #121004 has finished for PR 28104 at commit 207cae8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-04-09T06:20:28Z

docs/sql-ref-functions-udf-hive.md

+// e.g., `spark.sql("ADD JAR yourHiveUDF.jar")`.
+spark.sql("CREATE TEMPORARY FUNCTION testUDF AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs'")
+
+spark.sql("SELECT * FROM hiveUDFTestTable").show()


Ur, my bad. nit: hiveUDFTestTable -> t.
btw, any reason to write this doc by Scala? Could we follow the SQL format here, too?

OK. Will convert to SQL

yea, thanks!

SparkQA · 2020-04-09T08:05:41Z

Test build #121018 has finished for PR 28104 at commit 946e417.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu

LGTM. Thanks, @huajianmao ! cc: @srowen

srowen · 2020-04-09T18:28:18Z

Merged to master/3.0

…AFs/UDTFs ### What changes were proposed in this pull request? Document Spark integration with Hive UDFs/UDAFs/UDTFs ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png"> ### How was this patch tested? Manually build and check Closes #28104 from huaxingao/hive-udfs. Lead-authored-by: Huaxin Gao <[email protected]> Co-authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 61f903f) Signed-off-by: Sean Owen <[email protected]>

huaxingao · 2020-04-09T19:05:24Z

Thank you everyone!

…AFs/UDTFs ### What changes were proposed in this pull request? Document Spark integration with Hive UDFs/UDAFs/UDTFs ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes <img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png"> ### How was this patch tested? Manually build and check Closes apache#28104 from huaxingao/hive-udfs. Lead-authored-by: Huaxin Gao <[email protected]> Co-authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Sean Owen <[email protected]>

HyukjinKwon · 2020-04-20T05:26:09Z

docs/sql-ref-functions-udf-hive.md

+    AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode';
+
+SELECT * FROM t;
+  +------+


quick question. Why did we use:

+---+ |col| +---+ | 1| | 2| | 3| | 4| +---+

format over the Hive string format (which is produced by spark-sql script)?

Also, seems like we should comment these output out.

Ah, I see. Actually, no strong reason. Just for format consistency. Before #28151, we used the different & inconsistent formats cross the SQL documents. So, I put the simple rule to use the same format in #28151. But, If we have a better format for the documents, the reformat looks fine.

Hmm.. I see. I double checked other references such as https://docs.snowflake.com/en/sql-reference/constructs/join.html, https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm, https://www.postgresql.org/docs/10/sql-select.html.

Looks they don't add leading two spaces at least(?). I don't have a strong opinion on this yet. Can we at least remove leading two spaces?

Also, seems like we should comment these output out.

Not sure to comment out the output or not. In SQL syntax section, we didn't comment out any of the output. But in the UDAF SQL example, I commented out the output to be consistent with the scala and java examples.

Yea, removing the spaces looks fine. I personally think the most important thing is just to keep the almost same format over the documents. So, I think we can update each rule in the current format if we have a better one. Anyway, thanks for the check, @HyukjinKwon

Okay, thank you guys. It's not urgent but let's remove the two leading spaces. I think that looks more consistent with other references at least.

maropu reviewed Apr 3, 2020

View reviewed changes

docs/sql-ref-functions-udf-hive.md Outdated Show resolved Hide resolved

maropu reviewed Apr 3, 2020

View reviewed changes

docs/sql-ref-functions-udf-hive.md Outdated Show resolved Hide resolved

maropu reviewed Apr 3, 2020

View reviewed changes

docs/sql-ref-functions-udf-hive.md Outdated Show resolved Hide resolved

maropu reviewed Apr 3, 2020

View reviewed changes

docs/sql-ref-functions-udf-hive.md Outdated Show resolved Hide resolved

maropu reviewed Apr 3, 2020

View reviewed changes

viirya reviewed Apr 3, 2020

View reviewed changes

docs/sql-ref-functions-udf-hive.md Show resolved Hide resolved

viirya reviewed Apr 3, 2020

View reviewed changes

docs/sql-ref-functions-udf-hive.md Outdated Show resolved Hide resolved

huaxingao force-pushed the hive-udfs branch from d777990 to 2638e01 Compare April 6, 2020 21:14

huaxingao added 3 commits April 6, 2020 14:19

[SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UD…

177a01a

…AFs/UDTFs

address comments

3901cf6

add jar

100cea4

huaxingao force-pushed the hive-udfs branch from 2638e01 to 100cea4 Compare April 6, 2020 21:20

remove extra blanlk line

52269f2

viirya approved these changes Apr 6, 2020

View reviewed changes

Fix

bc87b37

Merge pull request #2 from maropu/pr28104-followup

207cae8

Fix

maropu reviewed Apr 9, 2020

View reviewed changes

change scala example to sql example

946e417

maropu approved these changes Apr 9, 2020

View reviewed changes

srowen closed this in 61f903f Apr 9, 2020

huaxingao deleted the hive-udfs branch April 9, 2020 19:05

HyukjinKwon reviewed Apr 20, 2020

View reviewed changes

[SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs #28104

[SPARK-31331][SQL][DOCS] Document Spark integration with Hive UDFs/UDAFs/UDTFs #28104

Uh oh!

Conversation

huaxingao commented Apr 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

huaxingao commented Apr 2, 2020

Uh oh!

SparkQA commented Apr 2, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maropu Apr 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Apr 3, 2020

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Apr 3, 2020

Uh oh!

SparkQA commented Apr 3, 2020

Uh oh!

maropu commented Apr 3, 2020

Uh oh!

SparkQA commented Apr 3, 2020

Uh oh!

srowen commented Apr 6, 2020

Uh oh!

maropu commented Apr 6, 2020

Uh oh!

SparkQA commented Apr 6, 2020

Uh oh!

SparkQA commented Apr 7, 2020

Uh oh!

maropu commented Apr 7, 2020

Uh oh!

SparkQA commented Apr 9, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 9, 2020

Uh oh!

maropu left a comment

Choose a reason for hiding this comment

Uh oh!

srowen commented Apr 9, 2020

Uh oh!

huaxingao commented Apr 9, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huaxingao commented Apr 2, 2020 •

edited

Loading

maropu Apr 3, 2020 •

edited

Loading

maropu Apr 20, 2020 •

edited

Loading