Skip to content

Conversation

@foxik
Copy link
Contributor

@foxik foxik commented Feb 25, 2015

The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index.

The current implementation also work, but always uses only two partitions -- the first one and the last one (because the bisect_left return returns either "beginning" or "end" for a descending sequence).

The samples should always be sorted in ascending order, because
bisect.bisect_left is used on it. The reverse order of the result
is already achieved in rangePartitioner by reversing the found index.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@JoshRosen
Copy link
Contributor

Could you add a regression test for this issue? It looks like you have one in the JIRA ticket, so adding one hopefully should not be much work. Take a look at python/pyspark/tests.py for examples of this. Make sure to add a comment referencing the JIRA number.

@foxik
Copy link
Contributor Author

foxik commented Mar 5, 2015

I added the regression test. It also tests that sortByKey returns sorted sequence and tests also ascending sequence, which are not strictly necessary for SPARK-5969, but I added them anyway.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28315 has started for PR 4761 at commit bc2647f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28315 has finished for PR 4761 at commit bc2647f.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28315/
Test FAILed.

@foxik foxik force-pushed the fix-descending-sort branch from bc2647f to 95896b5 Compare March 6, 2015 08:37
@foxik
Copy link
Contributor Author

foxik commented Mar 6, 2015

I have amended the regression test commit to pass lint-python.

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28334 has started for PR 4761 at commit 95896b5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28334 has finished for PR 4761 at commit 95896b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28334/
Test PASSed.

@JoshRosen
Copy link
Contributor

@davies, does this look good to you? Sorry for letting this patch fall off my radar (slowly getting caught up on a backlog of reviews). If things look good, I can fix the merge conflict (which is probably just a conflict in tests) and get this committed.

@davies
Copy link
Contributor

davies commented Apr 8, 2015

LGTM

@JoshRosen
Copy link
Contributor

Alright, merging this into master (1.4.0) now. Thanks!

@asfgit asfgit closed this in 0375134 Apr 10, 2015
@JoshRosen
Copy link
Contributor

Should this be backported anywhere?

@davies
Copy link
Contributor

davies commented Apr 10, 2015

This is a bug since the beginning (0.8), could we back port it for all 1.0+ branches?

asfgit pushed a commit that referenced this pull request Apr 10, 2015
The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index.

The current implementation also work, but always uses only two partitions -- the first one and the last one (because the bisect_left return returns either "beginning" or "end" for a descending sequence).

Author: Milan Straka <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Josh Rosen <[email protected]>

Closes #4761 from foxik/fix-descending-sort and squashes the following commits:

95896b5 [Milan Straka] Add regression test for SPARK-5969.
5757490 [Milan Straka] Fix descending pyspark.rdd.sortByKey.
asfgit pushed a commit that referenced this pull request Apr 10, 2015
The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index.

The current implementation also work, but always uses only two partitions -- the first one and the last one (because the bisect_left return returns either "beginning" or "end" for a descending sequence).

Author: Milan Straka <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Josh Rosen <[email protected]>

Closes #4761 from foxik/fix-descending-sort and squashes the following commits:

95896b5 [Milan Straka] Add regression test for SPARK-5969.
5757490 [Milan Straka] Fix descending pyspark.rdd.sortByKey.
@JoshRosen
Copy link
Contributor

I've cherry-picked the fix into branch-1.2 (1.2.3) and branch-1.3 (1.3.2). I'm going to omit the the pre-1.2 backports for now because I hit a test merge conflict and also because I think it's unlikely that we're going to release another 1.1.x version.

@davies
Copy link
Contributor

davies commented Apr 10, 2015

Make sense, thank you!

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Apr 15, 2015
The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index.

The current implementation also work, but always uses only two partitions -- the first one and the last one (because the bisect_left return returns either "beginning" or "end" for a descending sequence).

Author: Milan Straka <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Josh Rosen <[email protected]>

Closes apache#4761 from foxik/fix-descending-sort and squashes the following commits:

95896b5 [Milan Straka] Add regression test for SPARK-5969.
5757490 [Milan Straka] Fix descending pyspark.rdd.sortByKey.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants