[SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF #25939

d80tb7 · 2019-09-26T08:22:34Z

Follow up from #24981 incorporating some comments from @HyukjinKwon.

Specifically:

Adding CoGroupedData to pyspark/sql/__init__.py __all__ so that documentation is generated.
Added pydoc, including example, for the use case whereby the user supplies a cogrouping function including a key.
Added the boilerplate for doctests to cogroup.py. Note that cogroup.py only contains the apply() function which has doctests disabled as per the other Pandas Udfs.
Restricted the newly exposed RelationalGroupedDataset constructor parameters to access only by the sql package.
Some minor formatting tweaks.

This was tested by running the appropriate unit tests. I'm unsure as to how to check that my change will cause the documentation to be generated correctly, but it someone can describe how I can do this I'd be happy to check.

sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala

HyukjinKwon · 2019-09-26T08:30:18Z

add to whitelist

python/pyspark/sql/cogroup.py

SparkQA · 2019-09-26T08:34:47Z

Test build #111410 has finished for PR 25939 at commit becf4ac.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-26T08:58:57Z

Test build #111415 has finished for PR 25939 at commit 81e5aed.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-26T18:36:09Z

Test build #111430 has finished for PR 25939 at commit 7405e3a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-26T21:12:53Z

Test build #111450 has finished for PR 25939 at commit 7e02ea3.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon

LGTM otherwise. Thanks for addressing my comments @d80tb7.

python/pyspark/sql/cogroup.py

SparkQA · 2019-09-27T01:08:55Z

Test build #111452 has finished for PR 25939 at commit 84dd277.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-29T21:06:41Z

Test build #111574 has finished for PR 25939 at commit 4cdb2fa.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-29T23:51:12Z

retest this please

SparkQA · 2019-09-30T03:34:59Z

Test build #111579 has finished for PR 25939 at commit 4cdb2fa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-30T13:25:19Z

Merged to master.

This PR adds some extra documentation for the new Cogrouped map Pandas udfs. Specifically: - Updated the usage guide for the new `COGROUPED_MAP` Pandas udfs added in #24981 - Updated the docstring for pandas_udf to include the COGROUPED_MAP type as suggested by HyukjinKwon in #25939 Closes #26110 from d80tb7/SPARK-29126-cogroup-udf-usage-guide. Authored-by: Chris Martin <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

d80tb7 added 4 commits September 24, 2019 17:38

code review fixes

21b0eeb

added doc for cogroup function including key

93af701

added doctest for cogroup

d1a60ce

corrected line wrapping

becf4ac

d80tb7 mentioned this pull request Sep 26, 2019

[SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs #24981

Closed

HyukjinKwon reviewed Sep 26, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala Outdated Show resolved Hide resolved

HyukjinKwon reviewed Sep 26, 2019

View reviewed changes

python/pyspark/sql/cogroup.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed Sep 26, 2019

View reviewed changes

python/pyspark/sql/cogroup.py Show resolved Hide resolved

HyukjinKwon changed the title ~~[SPARK-27463][PYTHON] Tidy Up~~ [SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF Sep 26, 2019

fixed python line length

81e5aed

fix pycodestyle failure

7405e3a

improved cogroup doc

7e02ea3

fix sphinx error

84dd277

dongjoon-hyun added PYSPARK SQL labels Sep 26, 2019

HyukjinKwon approved these changes Sep 27, 2019

View reviewed changes

python/pyspark/sql/cogroup.py Outdated Show resolved Hide resolved

python/pyspark/sql/cogroup.py Outdated Show resolved Hide resolved

python/pyspark/sql/cogroup.py Outdated Show resolved Hide resolved

removed unnecessary stuff from doctest

4cdb2fa

HyukjinKwon closed this in 76791b8 Sep 30, 2019

d80tb7 mentioned this pull request Oct 14, 2019

[SPARK-29126][PYSPARK][DOC] Pandas Cogroup udf usage guide #26110

Closed

[SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF #25939

[SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF #25939

Uh oh!

Conversation

d80tb7 commented Sep 26, 2019

Uh oh!

Uh oh!

HyukjinKwon commented Sep 26, 2019

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Sep 26, 2019

Uh oh!

SparkQA commented Sep 26, 2019

Uh oh!

SparkQA commented Sep 26, 2019

Uh oh!

SparkQA commented Sep 26, 2019

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Sep 27, 2019

Uh oh!

SparkQA commented Sep 29, 2019

Uh oh!

HyukjinKwon commented Sep 29, 2019

Uh oh!

SparkQA commented Sep 30, 2019

Uh oh!

HyukjinKwon commented Sep 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants