-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-25904][CORE] Allocate arrays smaller than Int.MaxValue #22818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
JVMs don't you allocate arrays of length exactly Int.MaxValue, so leave a little extra room.
You grammar there. |
|
Looks good to me. I reran the test that encountered this issue on a secure cluster after deploying a build with this change and now it passes. |
|
Test build #97987 has finished for PR 22818 at commit
|
|
Thanks, would it be also possible to double-check |
|
Actually there are quite a few more uses, even of > scala -J-Xmx16G
scala> val x = new scala.collection.mutable.ArrayBuffer[Int](128)
scala> x.sizeHint(Int.MaxValue)
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at scala.collection.mutable.ArrayBuffer.sizeHint(ArrayBuffer.scala:69)
... 30 elideddo you think its important to tackle them all here? I could also open another jira to do an audit |
|
Since this PR is not a blocker for 2.4, it is not necessary to close this ASAP. |
|
Test build #98274 has finished for PR 22818 at commit
|
|
@kiszk I've updated this to cover more cases. I didn't cover some of them in mllib-local, as ByteArrayMethods isn't visible there, and it would really only very slightly improve an error msg, so didn't seem worth a refactor here. but dont' feel super strongly about it either. |
|
I also changed the issue to SPARK-25904 -- there are some other things related to encryption that it makes more sense to handle under SPARK-25827. |
|
Test build #98371 has finished for PR 22818 at commit
|
|
retest this please |
|
Test build #98435 has finished for PR 22818 at commit
|
|
retest this please |
|
Test build #98451 has finished for PR 22818 at commit
|
|
Test build #98504 has finished for PR 22818 at commit
|
core/src/main/scala/org/apache/spark/internal/config/package.scala
Outdated
Show resolved
Hide resolved
| truncate: Int = 20, | ||
| vertical: Boolean = false): String = { | ||
| val numRows = _numRows.max(0).min(Int.MaxValue - 1) | ||
| val numRows = _numRows.max(0).min(ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "- 1" really necessary after migrating form Int.MaxValue - 1 to ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH - 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is -- we make a Seq potentially one larger than this value here:
| val rows = tmpRows.take(numRows + 1) |
(admittedly its a stretch, you shouldn't be showing 2B rows anyway)
|
Test build #4413 has finished for PR 22818 at commit
|
|
thanks for the review @attilapiros, fixed the style issue, I think the other one is OK as is. |
|
Test build #98520 has finished for PR 22818 at commit
|
jiangxb1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
LGTM |
|
merged to master |
|
since this is a bug fix, shall we also backport it? |
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail. Closes apache#22818 from squito/SPARK-25827. Authored-by: Imran Rashid <[email protected]> Signed-off-by: Imran Rashid <[email protected]> (cherry picked from commit 8fbc183)
|
@cloud-fan oh good point, sorry that was an oversight on my part. Since it was clean I just pushed it directly here: 47a668c I'll update jira too |
|
argh, sorry about the mistake, thank you for the fix @cloud-fan |
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail. Closes apache#22818 from squito/SPARK-25827. Authored-by: Imran Rashid <[email protected]> Signed-off-by: Imran Rashid <[email protected]>
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail. Closes apache#22818 from squito/SPARK-25827. Authored-by: Imran Rashid <[email protected]> Signed-off-by: Imran Rashid <[email protected]> (cherry picked from commit 8fbc183)
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail. Closes apache#22818 from squito/SPARK-25827. Authored-by: Imran Rashid <[email protected]> Signed-off-by: Imran Rashid <[email protected]> (cherry picked from commit 8fbc183)
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail.