[SPARK-8428][SPARK-13850] Fix integer overflows in TimSort #13336

sameeragarwal · 2016-05-26T20:48:57Z

What changes were proposed in this pull request?

This patch fixes a few integer overflows in UnsafeSortDataFormat.copyRange() and ShuffleSortDataFormat copyRange() that seems to be the most likely cause behind a number of TimSort contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets.

How was this patch tested?

Added a test in ExternalSorterSuite that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a copyRange in TimSort.mergeLo or TimSort.mergeHi. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur.

sameeragarwal · 2016-05-26T20:49:47Z

cc @davies @rxin

rxin · 2016-05-26T20:50:56Z

cc @tejasapatil & @sitalkedia this should fix one of the problems you guys run into.

davies · 2016-05-26T20:53:46Z

LGTM

sitalkedia · 2016-05-26T20:55:51Z

This is a nice find @sameeragarwal. Let me test this fix with our failing job to see if that works.

andrewor14 · 2016-05-26T21:08:36Z

LGTM

aching · 2016-05-26T21:35:49Z

Thanks @sameeragarwal and @rxin!

SparkQA · 2016-05-26T22:44:50Z

Test build #59416 has finished for PR 13336 at commit f722e44.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-05-26T22:51:02Z

Merging in master/2.0/1.6.

## What changes were proposed in this pull request? This patch fixes a few integer overflows in `UnsafeSortDataFormat.copyRange()` and `ShuffleSortDataFormat copyRange()` that seems to be the most likely cause behind a number of `TimSort` contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets. ## How was this patch tested? Added a test in `ExternalSorterSuite` that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a `copyRange` in `TimSort.mergeLo` or `TimSort.mergeHi`. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur. Author: Sameer Agarwal <[email protected]> Closes #13336 from sameeragarwal/timsort-bug. (cherry picked from commit fe6de16) Signed-off-by: Reynold Xin <[email protected]>

This patch fixes a few integer overflows in `UnsafeSortDataFormat.copyRange()` and `ShuffleSortDataFormat copyRange()` that seems to be the most likely cause behind a number of `TimSort` contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets. Added a test in `ExternalSorterSuite` that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a `copyRange` in `TimSort.mergeLo` or `TimSort.mergeHi`. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur. Author: Sameer Agarwal <[email protected]> Closes #13336 from sameeragarwal/timsort-bug. (cherry picked from commit fe6de16) Signed-off-by: Reynold Xin <[email protected]>

This patch fixes a few integer overflows in `UnsafeSortDataFormat.copyRange()` and `ShuffleSortDataFormat copyRange()` that seems to be the most likely cause behind a number of `TimSort` contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets. Added a test in `ExternalSorterSuite` that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a `copyRange` in `TimSort.mergeLo` or `TimSort.mergeHi`. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur. Author: Sameer Agarwal <[email protected]> Closes apache#13336 from sameeragarwal/timsort-bug. (cherry picked from commit fe6de16) Signed-off-by: Reynold Xin <[email protected]> (cherry picked from commit 0b8bdf7)

yhuai · 2016-05-27T02:24:54Z

Seems it breaks 1.6 build?

yhuai · 2016-05-27T02:25:51Z

sorry. It has been fixed.

Fix integer overflows in TimSort

f722e44

rxin mentioned this pull request May 26, 2016

[SPARK-13850] Force the sorter to Spill when number of elements in th… #13107

Closed

asfgit closed this in fe6de16 May 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-8428][SPARK-13850] Fix integer overflows in TimSort #13336

[SPARK-8428][SPARK-13850] Fix integer overflows in TimSort #13336

Uh oh!

sameeragarwal commented May 26, 2016

Uh oh!

sameeragarwal commented May 26, 2016

Uh oh!

rxin commented May 26, 2016 •

edited

Loading

Uh oh!

davies commented May 26, 2016

Uh oh!

sitalkedia commented May 26, 2016

Uh oh!

andrewor14 commented May 26, 2016

Uh oh!

aching commented May 26, 2016

Uh oh!

SparkQA commented May 26, 2016

Uh oh!

rxin commented May 26, 2016

Uh oh!

yhuai commented May 27, 2016

Uh oh!

yhuai commented May 27, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

[SPARK-8428][SPARK-13850] Fix integer overflows in TimSort #13336

[SPARK-8428][SPARK-13850] Fix integer overflows in TimSort #13336

Uh oh!

Conversation

sameeragarwal commented May 26, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sameeragarwal commented May 26, 2016

Uh oh!

rxin commented May 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davies commented May 26, 2016

Uh oh!

sitalkedia commented May 26, 2016

Uh oh!

andrewor14 commented May 26, 2016

Uh oh!

aching commented May 26, 2016

Uh oh!

SparkQA commented May 26, 2016

Uh oh!

rxin commented May 26, 2016

Uh oh!

yhuai commented May 27, 2016

Uh oh!

yhuai commented May 27, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

rxin commented May 26, 2016 •

edited

Loading