Skip to content

Conversation

@GregBowyer
Copy link

What changes were proposed in this pull request?

API access to allow pyspark to use bucketBy and sortBy in datraframes.

self._jwrite = self._jwrite.partitionBy(_to_seq(self._spark._sc, cols))
return self

@since(2.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since should be 2.1 since I don't think this will go into branch-2.0

@MLnick
Copy link
Contributor

MLnick commented Aug 8, 2016

ok to test

@SparkQA
Copy link

SparkQA commented Aug 8, 2016

Test build #63364 has finished for PR 14517 at commit 47d9ef7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@GregBowyer
Copy link
Author

Amended commit with style changes from MLNick. Can someone call the OK to test please

@SparkQA
Copy link

SparkQA commented Aug 9, 2016

Test build #63411 has finished for PR 14517 at commit 0bc078a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 9, 2016

Test build #63414 has finished for PR 14517 at commit 96df186.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 9, 2016

Test build #63422 has finished for PR 14517 at commit 31c43e6.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2016

Test build #63738 has finished for PR 14517 at commit ce9f9c0.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2016

Test build #63739 has finished for PR 14517 at commit 68cf597.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 is saying this line is too long (over 100 chars) so its failing the style tests. You can run the style tests locally with ./dev/lint-python as well for a faster turn around than Jenkins :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the note, I was getting annoyed at not knowing where to find the tools for such things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing - I'm always happy to help people get up to speed with contributing to PySpark so feel free to reach out to me if you get stuck with something similar.

@SparkQA
Copy link

SparkQA commented Aug 19, 2016

Test build #64097 has finished for PR 14517 at commit 78a7b63.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 19, 2016

Test build #64106 has finished for PR 14517 at commit 8e8ce75.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 20, 2016

Test build #64113 has finished for PR 14517 at commit dfef36b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 22, 2016

Test build #64227 has finished for PR 14517 at commit 2e8c191.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 22, 2016

Test build #64229 has finished for PR 14517 at commit 8eb8e71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 22, 2016

Test build #64240 has finished for PR 14517 at commit dbb50ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@GregBowyer
Copy link
Author

What thoughts do people have about merging in?

except py4j.protocol.Py4JError:
spark = SparkSession(sc)

seed = int(time() * 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to have determistic test, testing with parquet should be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been really busy with work of late, but I will try to sort this out today

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GregBowyer Any progress on this? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GregBowyer ping. Let me propose to close this after a week.

Copy link
Member

@HyukjinKwon HyukjinKwon Feb 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zero323, would you maybe be interested in taking over this? I was thinking of taking over this if no one goes for it assuming it looks quite close to be merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon By all means. I prepared a bunch of tests (7d911c647f21ada7fb429fd7c1c5f15934ff8847) and extended a bit code provided by @GregBowyer (72c04a3f196da5223ebb44725aa88cffa81036e4). I think we can skip low level tests (direct access to the files) which are already present in Scala test base.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zero323, Good to know. Then, please go ahead if you are ready :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants