Skip to content

Conversation

@sameeragarwal
Copy link
Member

What changes were proposed in this pull request?

This patch fixes a few integer overflows in UnsafeSortDataFormat.copyRange() and ShuffleSortDataFormat copyRange() that seems to be the most likely cause behind a number of TimSort contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets.

How was this patch tested?

Added a test in ExternalSorterSuite that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a copyRange in TimSort.mergeLo or TimSort.mergeHi. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur.

@sameeragarwal
Copy link
Member Author

cc @davies @rxin

@rxin
Copy link
Contributor

rxin commented May 26, 2016

cc @tejasapatil & @sitalkedia this should fix one of the problems you guys run into.

@davies
Copy link
Contributor

davies commented May 26, 2016

LGTM

@sitalkedia
Copy link

This is a nice find @sameeragarwal. Let me test this fix with our failing job to see if that works.

@andrewor14
Copy link
Contributor

LGTM

@aching
Copy link

aching commented May 26, 2016

Thanks @sameeragarwal and @rxin!

@SparkQA
Copy link

SparkQA commented May 26, 2016

Test build #59416 has finished for PR 13336 at commit f722e44.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented May 26, 2016

Merging in master/2.0/1.6.

@asfgit asfgit closed this in fe6de16 May 26, 2016
asfgit pushed a commit that referenced this pull request May 26, 2016
## What changes were proposed in this pull request?

This patch fixes a few integer overflows in `UnsafeSortDataFormat.copyRange()` and `ShuffleSortDataFormat copyRange()` that seems to be the most likely cause behind a number of `TimSort` contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets.

## How was this patch tested?

Added a test in `ExternalSorterSuite` that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a `copyRange` in `TimSort.mergeLo` or `TimSort.mergeHi`. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur.

Author: Sameer Agarwal <[email protected]>

Closes #13336 from sameeragarwal/timsort-bug.

(cherry picked from commit fe6de16)
Signed-off-by: Reynold Xin <[email protected]>
asfgit pushed a commit that referenced this pull request May 26, 2016
This patch fixes a few integer overflows in `UnsafeSortDataFormat.copyRange()` and `ShuffleSortDataFormat copyRange()` that seems to be the most likely cause behind a number of `TimSort` contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets.

Added a test in `ExternalSorterSuite` that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a `copyRange` in `TimSort.mergeLo` or `TimSort.mergeHi`. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur.

Author: Sameer Agarwal <[email protected]>

Closes #13336 from sameeragarwal/timsort-bug.

(cherry picked from commit fe6de16)
Signed-off-by: Reynold Xin <[email protected]>
zzcclp pushed a commit to zzcclp/spark that referenced this pull request May 27, 2016
This patch fixes a few integer overflows in `UnsafeSortDataFormat.copyRange()` and `ShuffleSortDataFormat copyRange()` that seems to be the most likely cause behind a number of `TimSort` contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets.

Added a test in `ExternalSorterSuite` that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a `copyRange` in `TimSort.mergeLo` or `TimSort.mergeHi`. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur.

Author: Sameer Agarwal <[email protected]>

Closes apache#13336 from sameeragarwal/timsort-bug.

(cherry picked from commit fe6de16)
Signed-off-by: Reynold Xin <[email protected]>
(cherry picked from commit 0b8bdf7)
@yhuai
Copy link
Contributor

yhuai commented May 27, 2016

Seems it breaks 1.6 build?

@yhuai
Copy link
Contributor

yhuai commented May 27, 2016

sorry. It has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants