Skip to content

Conversation

@huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Apr 2, 2020

What changes were proposed in this pull request?

Document Spark integration with Hive UDFs/UDAFs/UDTFs

Why are the changes needed?

To make SQL Reference complete

Does this PR introduce any user-facing change?

Yes
Screen Shot 2020-04-19 at 11 17 49 PM
Screen Shot 2020-04-19 at 11 18 22 PM
Screen Shot 2020-04-19 at 11 18 46 PM

How was this patch tested?

Manually build and check

@huaxingao
Copy link
Contributor Author

cc @maropu @gatorsmile

@SparkQA
Copy link

SparkQA commented Apr 2, 2020

Test build #120738 has finished for PR 28104 at commit 6cf0798.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


<pre><code>
// Register a Hive UDF and use it in Spark SQL
// Scala
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we need ADD JAR for the hive UDF below here.

// GenericUDTFCount2 outputs the number of rows seen, twice.
// The function source code can be found at:
// https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF
sql(s"ADD JAR ${hiveContext.getHiveFile("TestUDTF.jar").getCanonicalPath}")
Copy link
Member

@maropu maropu Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since hiveContext.getHiveFile is a method for our test use and users easily cannot undersntad this example, I think we had not better use it for the document.

@maropu
Copy link
Member

maropu commented Apr 3, 2020

also cc: @viirya @srowen

@SparkQA
Copy link

SparkQA commented Apr 3, 2020

Test build #120753 has finished for PR 28104 at commit d777990.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 3, 2020

Test build #120752 has finished for PR 28104 at commit 67f9bf7.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 3, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Apr 3, 2020

Test build #120757 has finished for PR 28104 at commit d777990.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 6, 2020

@viirya @maropu OK with you?

@maropu
Copy link
Member

maropu commented Apr 6, 2020

@huaxingao It seems all the reviews have been not adressed yet, e.g., https://github.com/apache/spark/pull/28104/files#r402692479

@SparkQA
Copy link

SparkQA commented Apr 6, 2020

Test build #120887 has finished for PR 28104 at commit 52269f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 7, 2020

Test build #120886 has finished for PR 28104 at commit 2638e01.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 7, 2020

@huaxingao I brushed up the doc based on your PR. Could you check this? huaxingao#2

@SparkQA
Copy link

SparkQA commented Apr 9, 2020

Test build #121004 has finished for PR 28104 at commit 207cae8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// e.g., `spark.sql("ADD JAR yourHiveUDF.jar")`.
spark.sql("CREATE TEMPORARY FUNCTION testUDF AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs'")

spark.sql("SELECT * FROM hiveUDFTestTable").show()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, my bad. nit: hiveUDFTestTable -> t.
btw, any reason to write this doc by Scala? Could we follow the SQL format here, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Will convert to SQL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, thanks!

@SparkQA
Copy link

SparkQA commented Apr 9, 2020

Test build #121018 has finished for PR 28104 at commit 946e417.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @huajianmao ! cc: @srowen

@srowen srowen closed this in 61f903f Apr 9, 2020
@srowen
Copy link
Member

srowen commented Apr 9, 2020

Merged to master/3.0

srowen pushed a commit that referenced this pull request Apr 9, 2020
…AFs/UDTFs

### What changes were proposed in this pull request?
Document Spark integration with Hive UDFs/UDAFs/UDTFs

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
<img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png">

### How was this patch tested?
Manually build and check

Closes #28104 from huaxingao/hive-udfs.

Lead-authored-by: Huaxin Gao <[email protected]>
Co-authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 61f903f)
Signed-off-by: Sean Owen <[email protected]>
@huaxingao
Copy link
Contributor Author

Thank you everyone!

@huaxingao huaxingao deleted the hive-udfs branch April 9, 2020 19:05
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…AFs/UDTFs

### What changes were proposed in this pull request?
Document Spark integration with Hive UDFs/UDAFs/UDTFs

### Why are the changes needed?
To make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes
<img width="1031" alt="Screen Shot 2020-04-02 at 2 22 42 PM" src="https://user-images.githubusercontent.com/13592258/78301971-cc7cf080-74ee-11ea-93c8-7d4c75213b47.png">

### How was this patch tested?
Manually build and check

Closes apache#28104 from huaxingao/hive-udfs.

Lead-authored-by: Huaxin Gao <[email protected]>
Co-authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode';

SELECT * FROM t;
+------+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question. Why did we use:

  +---+
  |col|
  +---+
  |  1|
  |  2|
  |  3|
  |  4|
  +---+

format over the Hive string format (which is produced by spark-sql script)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, seems like we should comment these output out.

Copy link
Member

@maropu maropu Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Actually, no strong reason. Just for format consistency. Before #28151, we used the different & inconsistent formats cross the SQL documents. So, I put the simple rule to use the same format in #28151. But, If we have a better format for the documents, the reformat looks fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I see. I double checked other references such as https://docs.snowflake.com/en/sql-reference/constructs/join.html, https://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm, https://www.postgresql.org/docs/10/sql-select.html.

Looks they don't add leading two spaces at least(?). I don't have a strong opinion on this yet. Can we at least remove leading two spaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, seems like we should comment these output out.

Not sure to comment out the output or not. In SQL syntax section, we didn't comment out any of the output. But in the UDAF SQL example, I commented out the output to be consistent with the scala and java examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, removing the spaces looks fine. I personally think the most important thing is just to keep the almost same format over the documents. So, I think we can update each rule in the current format if we have a better one. Anyway, thanks for the check, @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you guys. It's not urgent but let's remove the two leading spaces. I think that looks more consistent with other references at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants