Skip to content

Conversation

@d80tb7
Copy link
Contributor

@d80tb7 d80tb7 commented Sep 26, 2019

Follow up from #24981 incorporating some comments from @HyukjinKwon.

Specifically:

  • Adding CoGroupedData to pyspark/sql/__init__.py __all__ so that documentation is generated.
  • Added pydoc, including example, for the use case whereby the user supplies a cogrouping function including a key.
  • Added the boilerplate for doctests to cogroup.py. Note that cogroup.py only contains the apply() function which has doctests disabled as per the other Pandas Udfs.
  • Restricted the newly exposed RelationalGroupedDataset constructor parameters to access only by the sql package.
  • Some minor formatting tweaks.

This was tested by running the appropriate unit tests. I'm unsure as to how to check that my change will cause the documentation to be generated correctly, but it someone can describe how I can do this I'd be happy to check.

@HyukjinKwon
Copy link
Member

add to whitelist

@HyukjinKwon HyukjinKwon changed the title [SPARK-27463][PYTHON] Tidy Up [SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF Sep 26, 2019
@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111410 has finished for PR 25939 at commit becf4ac.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111415 has finished for PR 25939 at commit 81e5aed.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111430 has finished for PR 25939 at commit 7405e3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111450 has finished for PR 25939 at commit 7e02ea3.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. Thanks for addressing my comments @d80tb7.

@SparkQA
Copy link

SparkQA commented Sep 27, 2019

Test build #111452 has finished for PR 25939 at commit 84dd277.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 29, 2019

Test build #111574 has finished for PR 25939 at commit 4cdb2fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 30, 2019

Test build #111579 has finished for PR 25939 at commit 4cdb2fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

HyukjinKwon pushed a commit that referenced this pull request Oct 31, 2019
This PR adds some extra documentation for the new Cogrouped map Pandas udfs.  Specifically:

- Updated the usage guide for the new `COGROUPED_MAP` Pandas udfs added in #24981
- Updated the docstring for pandas_udf to include the COGROUPED_MAP type as suggested by HyukjinKwon in #25939

Closes #26110 from d80tb7/SPARK-29126-cogroup-udf-usage-guide.

Authored-by: Chris Martin <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants