Skip to content

Conversation

@advancedxy
Copy link
Contributor

There is an int overflow in the ParallelCollectionRDD.slice method. That's originally reported by SaintBacchus.

sc.makeRDD(1 to (Int.MaxValue)).count       // result = 0
sc.makeRDD(1 to (Int.MaxValue - 1)).count   // result = 2147483646 = Int.MaxValue - 1
sc.makeRDD(1 until (Int.MaxValue)).count    // result = 2147483646 = Int.MaxValue - 1

see #2874 for more details.
This pr try to fix the overflow. However, There's another issue I don't address.

val largeRange = Int.MinValue to Int.MaxValue
largeRange.length // throws java.lang.IllegalArgumentException: -2147483648 to 2147483647 by 1: seqs cannot contain more than Int.MaxValue elements.

So, the range we feed to sc.makeRDD cannot contain more than Int.MaxValue elements. This is the limitation of Scala. However I think we may want to support that kind of range. But the fix is beyond this pr.

@srowen @andrewor14 would you mind take a look at this pr?

… inclusive

and the end of the range is (Int.MaxValue or Int.MinValue), we should use
inclusive range instead of exclusive
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@advancedxy advancedxy changed the title [Spark-5201] deal with int overflow in the ParallelCollectionRDD.slice method [SPARK-5201][CORE] deal with int overflow in the ParallelCollectionRDD.slice method Jan 12, 2015
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing it but why is IntMinValue a special case here? Also the ({ on the next line is redundant. Just one is needed. sign isn't terribly descriptive here either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to convert this inclusive range

-2 to Int.MinValue by -1

to exclusive range will be

-2L until -1L + Int.MinValue by -1

-1 + Int.MinValue will overflow.

As for sign, which name would you recommend? How about inclusiveRangeWithIntBoundary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, the range can go backwards. Yeah, something like needsInclusiveRange or exceptionalBoundary or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Will change the name.
As for redundant ({, there is a infix operator toSeq, so I prefer the redundant one. And the previous code used ({

@andrewor14
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25419 has started for PR 4002 at commit 651c959.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25419 has finished for PR 4002 at commit 651c959.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25419/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need not be specific to Int.MaxValue and Int.MinValue. We can just do a special case for inclusive range in general and I think the code will be significantly more straightforward to read. I'll explain below.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25499 has started for PR 4002 at commit 96265a1.

  • This patch merges cleanly.

@advancedxy
Copy link
Contributor Author

@andrewor14 would you mind to take a look again?

 positions(r.length, numSlices).zipWithIndex.map({ case ((start, end), index) => // the redundant "({" is necessary, as there is an infix operator
          // If the range is inclusive, use inclusive range for the last slice
          if (r.isInclusive && index == numSlices - 1) {
            new Range.Inclusive(r.start + start * r.step, r.end, r.step)
          }
          else {
            new Range(r.start + start * r.step, r.start + end * r.step, r.step)
          }
        }).toSeq.asInstanceOf[Seq[Seq[T]]]  // the toSeq and casting is necessary, otherwise it won't compile due to type mismatch

I update some docs. There may be some wrongly chosen words as I am not a native speaker. Please correct me if you see any.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25499 has finished for PR 4002 at commit 96265a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25499/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary for callers to know this? It seems like purely an implementation detail that could be documented inside the function later. Same for the scaladoc later.
I would say "the sub Ranges in each slice are exclusive"
Otherwise at a glance this looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. I will update the docs if we decide to add them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I would rather leave this out. I'll delete this when I merge.

@advancedxy
Copy link
Contributor Author

ping @andrewor14

@advancedxy advancedxy closed this Jan 15, 2015
@advancedxy advancedxy reopened this Jan 15, 2015
@andrewor14
Copy link
Contributor

Thanks @advancedxy I'm merging this into master and 1.2

@asfgit asfgit closed this in e200ac8 Jan 16, 2015
asfgit pushed a commit that referenced this pull request Jan 16, 2015
…D.slice method

There is an int overflow in the ParallelCollectionRDD.slice method. That's originally reported by SaintBacchus.
```
sc.makeRDD(1 to (Int.MaxValue)).count       // result = 0
sc.makeRDD(1 to (Int.MaxValue - 1)).count   // result = 2147483646 = Int.MaxValue - 1
sc.makeRDD(1 until (Int.MaxValue)).count    // result = 2147483646 = Int.MaxValue - 1
```
see #2874 for more details.
This pr try to fix the overflow. However, There's another issue I don't address.
```
val largeRange = Int.MinValue to Int.MaxValue
largeRange.length // throws java.lang.IllegalArgumentException: -2147483648 to 2147483647 by 1: seqs cannot contain more than Int.MaxValue elements.
```

So, the range we feed to sc.makeRDD cannot contain more than Int.MaxValue elements. This is the limitation of Scala. However I think  we may want to support that kind of range. But the fix is beyond this pr.

srowen andrewor14 would you mind take a look at this pr?

Author: Ye Xianjin <[email protected]>

Closes #4002 from advancedxy/SPARk-5201 and squashes the following commits:

96265a1 [Ye Xianjin] Update slice method comment and some responding docs.
e143d7a [Ye Xianjin] Update inclusive range check for splitting inclusive range.
b3f5577 [Ye Xianjin] We can include the last element in the last slice in general for inclusive range, hence eliminate the need to check Int.MaxValue or Int.MinValue.
7d39b9e [Ye Xianjin] Convert the two cases pattern matching to one case.
651c959 [Ye Xianjin] rename sign to needsInclusiveRange. add some comments
196f8a8 [Ye Xianjin] Add test cases for ranges end with Int.MaxValue or Int.MinValue
e66e60a [Ye Xianjin] Deal with inclusive and exclusive ranges in one case. If the range is inclusive and the end of the range is (Int.MaxValue or Int.MinValue), we should use inclusive range instead of exclusive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants