[SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf #19517

ueshin · 2017-10-17T14:29:12Z

What changes were proposed in this pull request?

This is a follow-up of #18732.
This pr modifies GroupedData.apply() method to convert pandas udf to grouped udf implicitly.

How was this patch tested?

Exisiting tests.

This reverts commit 122a7bc.

SparkQA · 2017-10-17T17:44:44Z

Test build #82841 has finished for PR 19517 at commit 7b386c4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-17T18:23:23Z

Test build #82842 has finished for PR 19517 at commit 7e43bb4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-20T04:36:50Z

retest this please

gatorsmile · 2017-10-20T04:39:40Z

python/pyspark/sql/functions.py



+class PythonUdfType(object):
+    # row-based UDFs


Nit: Please update all row-based UDFs to row-at-a-time UDFs

Sure, I'll update it.

gatorsmile · 2017-10-20T04:40:58Z

sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala

 import org.apache.spark.sql.types.DataType

+private[spark] object PythonUdfType {
+  // row-based UDFs


The same here.

Sure, I'll update it, too.

gatorsmile · 2017-10-20T04:41:35Z

LGTM.

@ueshin Could you remove [WIP] from the title of this PR?

SparkQA · 2017-10-20T07:05:02Z

Test build #82922 has finished for PR 19517 at commit 59d61a4.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-20T07:05:02Z

Test build #82921 has finished for PR 19517 at commit 7e43bb4.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-20T07:06:09Z

retest this please

SparkQA · 2017-10-20T09:39:20Z

Test build #82926 has finished for PR 19517 at commit 59d61a4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-10-20T16:22:04Z

retest this please

SparkQA · 2017-10-20T19:38:21Z

Test build #82936 has finished for PR 19517 at commit 59d61a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin added 12 commits October 16, 2017 15:56

Introduce @pandas_grouped_udf decorator for grouped vectorized UDF.

4d2bd95

Use PythonUdfType instead of vectorized and grouped.

f096870

Update an error message.

639af2c

Add a test to use data type string.

10512a6

Restrict the number of arguments for grouped udf to only 1.

789e642

Restrict checking the number of arguments.

122a7bc

Revert "Restrict checking the number of arguments."

fdafb35

This reverts commit 122a7bc.

Address comments.

94d05f4

Add tests for unsupported type.

7332969

Address a comment.

85f250d

Remove @pandas_grouped_udf and convert implicitly.

7b386c4

Update descriptions.

7e43bb4

ueshin mentioned this pull request Oct 17, 2017

[WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf #19505

Closed

ueshin changed the title ~~[SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf~~ [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf Oct 18, 2017

gatorsmile reviewed Oct 20, 2017

View reviewed changes

ueshin added 2 commits October 20, 2017 13:52

Update descriptions.

dda2131

Update descriptions.

59d61a4

ueshin changed the title ~~[WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf~~ [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf Oct 20, 2017

asfgit closed this in b8624b0 Oct 20, 2017

[SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf #19517

[SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf #19517

Uh oh!

Conversation

ueshin commented Oct 17, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 17, 2017

Uh oh!

SparkQA commented Oct 17, 2017

Uh oh!

gatorsmile commented Oct 20, 2017

Uh oh!

gatorsmile Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

ueshin Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

ueshin Oct 20, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Oct 20, 2017

Uh oh!

SparkQA commented Oct 20, 2017

Uh oh!

SparkQA commented Oct 20, 2017

Uh oh!

gatorsmile commented Oct 20, 2017

Uh oh!

SparkQA commented Oct 20, 2017

Uh oh!

gatorsmile commented Oct 20, 2017

Uh oh!

SparkQA commented Oct 20, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants