[SPARK-16931][PYTHON] PySpark APIS for bucketBy and sortBy #14517

GregBowyer · 2016-08-06T01:00:43Z

What changes were proposed in this pull request?

API access to allow pyspark to use bucketBy and sortBy in datraframes.

MLnick · 2016-08-08T14:27:30Z

python/pyspark/sql/readwriter.py

        self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, cols))
        return self

+    @since(2.0)


since should be 2.1 since I don't think this will go into branch-2.0

MLnick · 2016-08-08T14:29:48Z

ok to test

SparkQA · 2016-08-08T14:33:53Z

Test build #63364 has finished for PR 14517 at commit 47d9ef7.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

GregBowyer · 2016-08-09T04:30:34Z

Amended commit with style changes from MLNick. Can someone call the OK to test please

SparkQA · 2016-08-09T05:05:17Z

Test build #63411 has finished for PR 14517 at commit 0bc078a.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-09T05:09:12Z

Test build #63414 has finished for PR 14517 at commit 96df186.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-09T07:28:34Z

Test build #63422 has finished for PR 14517 at commit 31c43e6.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-14T03:10:08Z

Test build #63738 has finished for PR 14517 at commit ce9f9c0.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-14T05:48:32Z

Test build #63739 has finished for PR 14517 at commit 68cf597.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

holdenk · 2016-08-16T17:45:00Z

python/pyspark/sql/readwriter.py

pep8 is saying this line is too long (over 100 chars) so its failing the style tests. You can run the style tests locally with ./dev/lint-python as well for a faster turn around than Jenkins :)

Thanks for the note, I was getting annoyed at not knowing where to find the tools for such things.

Sure thing - I'm always happy to help people get up to speed with contributing to PySpark so feel free to reach out to me if you get stuck with something similar.

SparkQA · 2016-08-19T20:19:32Z

Test build #64097 has finished for PR 14517 at commit 78a7b63.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-19T21:38:50Z

Test build #64106 has finished for PR 14517 at commit 8e8ce75.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-20T01:10:01Z

Test build #64113 has finished for PR 14517 at commit dfef36b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-22T19:41:40Z

Test build #64227 has finished for PR 14517 at commit 2e8c191.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-22T20:27:08Z

Test build #64229 has finished for PR 14517 at commit 8eb8e71.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-22T22:38:14Z

Test build #64240 has finished for PR 14517 at commit dbb50ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

GregBowyer · 2016-08-26T19:05:35Z

What thoughts do people have about merging in?

davies · 2016-09-02T22:27:05Z

python/pyspark/sql/readwriter.py

    except py4j.protocol.Py4JError:
        spark = SparkSession(sc)

+    seed = int(time() * 1000)


It's better to have determistic test, testing with parquet should be enough.

@GregBowyer ping

I have been really busy with work of late, but I will try to sort this out today

@GregBowyer ping

@GregBowyer Any progress on this? :)

@GregBowyer ping. Let me propose to close this after a week.

cc @zero323, would you maybe be interested in taking over this? I was thinking of taking over this if no one goes for it assuming it looks quite close to be merged.

@HyukjinKwon By all means. I prepared a bunch of tests (7d911c647f21ada7fb429fd7c1c5f15934ff8847) and extended a bit code provided by @GregBowyer (72c04a3f196da5223ebb44725aa88cffa81036e4). I think we can skip low level tests (direct access to the files) which are already present in Scala test base.

@zero323, Good to know. Then, please go ahead if you are ready :).

MLnick reviewed Aug 8, 2016
View reviewed changes

GregBowyer force-pushed the pyspark-bucketing branch from 47d9ef7 to 0bc078a Compare August 9, 2016 04:28

GregBowyer force-pushed the pyspark-bucketing branch from 0bc078a to 96df186 Compare August 9, 2016 04:39

GregBowyer force-pushed the pyspark-bucketing branch from 96df186 to 31c43e6 Compare August 9, 2016 06:59

GregBowyer force-pushed the pyspark-bucketing branch from 31c43e6 to ce9f9c0 Compare August 14, 2016 02:43

holdenk reviewed Aug 16, 2016
View reviewed changes

GregBowyer added 2 commits August 22, 2016 14:55

[SPARK-16931][PYTHON] PySpark APIS for bucketBy and sortBy

f49b9a2

[TEST][PYTHON] Make dataframe tests run in independent dir

dbb50ad

GregBowyer force-pushed the pyspark-bucketing branch from 8eb8e71 to dbb50ad Compare August 22, 2016 21:55

davies reviewed Sep 2, 2016
View reviewed changes

HyukjinKwon mentioned this pull request Feb 15, 2017

[BUILD] Close stale PRs #16937

Closed

asfgit closed this in ed338f7 Feb 17, 2017

[SPARK-16931][PYTHON] PySpark APIS for bucketBy and sortBy #14517

[SPARK-16931][PYTHON] PySpark APIS for bucketBy and sortBy #14517

Uh oh!

Conversation

GregBowyer commented Aug 6, 2016

What changes were proposed in this pull request?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MLnick commented Aug 8, 2016

Uh oh!

SparkQA commented Aug 8, 2016

Uh oh!

GregBowyer commented Aug 9, 2016

Uh oh!

SparkQA commented Aug 9, 2016

Uh oh!

SparkQA commented Aug 9, 2016

Uh oh!

SparkQA commented Aug 9, 2016

Uh oh!

SparkQA commented Aug 14, 2016

Uh oh!

SparkQA commented Aug 14, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 19, 2016

Uh oh!

SparkQA commented Aug 19, 2016

Uh oh!

SparkQA commented Aug 20, 2016

Uh oh!

SparkQA commented Aug 22, 2016

Uh oh!

SparkQA commented Aug 22, 2016

Uh oh!

SparkQA commented Aug 22, 2016

Uh oh!

GregBowyer commented Aug 26, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Feb 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

HyukjinKwon Feb 27, 2017 •

edited

Loading