[SPARK-25904][CORE] Allocate arrays smaller than Int.MaxValue #22818

squito · 2018-10-24T20:46:37Z

JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail.

JVMs don't you allocate arrays of length exactly Int.MaxValue, so leave a little extra room.

squito · 2018-10-24T20:46:44Z

@kiszk

vanzin · 2018-10-24T21:01:24Z

JVMs don't you allocate arrays

You grammar there.

wypoon · 2018-10-25T00:07:49Z

Looks good to me. I reran the test that encountered this issue on a secure cluster after deploying a build with this change and now it passes.

SparkQA · 2018-10-25T00:52:55Z

Test build #97987 has finished for PR 22818 at commit 64b5ed4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-10-25T18:03:00Z

Thanks, would it be also possible to double-check Integer.MAX_VALUE if you have not checked yet?

squito · 2018-10-25T19:41:29Z

Actually there are quite a few more uses, even of Int.MaxValue, which I find suspicious, but for the moment I only wanted to touch the cases I understood better. For example, "spark.sql.sortMergeJoinExec.buffer.in.memory.threshold" is used as the max size for an ArrayBuffer in ExternalAppendOnlyUnsafeRowArray, and I'm pretty sure that will cause the same problems:

> scala -J-Xmx16G
scala> val x = new scala.collection.mutable.ArrayBuffer[Int](128)
scala> x.sizeHint(Int.MaxValue)
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
  at scala.collection.mutable.ArrayBuffer.sizeHint(ArrayBuffer.scala:69)
  ... 30 elided

do you think its important to tackle them all here? I could also open another jira to do an audit

kiszk · 2018-10-28T11:32:16Z

Since this PR is not a blocker for 2.4, it is not necessary to close this ASAP.
Thus, I think that it would be good to address these issues as possible.

SparkQA · 2018-10-30T21:52:05Z

Test build #98274 has finished for PR 22818 at commit 3d77303.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2018-10-31T19:55:30Z

@kiszk I've updated this to cover more cases. I didn't cover some of them in mllib-local, as ByteArrayMethods isn't visible there, and it would really only very slightly improve an error msg, so didn't seem worth a refactor here. but dont' feel super strongly about it either.

squito · 2018-10-31T20:45:47Z

I also changed the issue to SPARK-25904 -- there are some other things related to encryption that it makes more sense to handle under SPARK-25827.

SparkQA · 2018-11-02T01:41:42Z

Test build #98371 has finished for PR 22818 at commit ca3efd8.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2018-11-02T15:53:16Z

failure is https://issues.apache.org/jira/browse/SPARK-25923

kiszk · 2018-11-04T03:15:56Z

retest this please

SparkQA · 2018-11-04T07:05:01Z

Test build #98435 has finished for PR 22818 at commit ca3efd8.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-11-04T15:47:45Z

retest this please

SparkQA · 2018-11-04T19:54:37Z

Test build #98451 has finished for PR 22818 at commit ca3efd8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-06T08:05:01Z

Test build #98504 has finished for PR 22818 at commit 361bf02.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

core/src/main/scala/org/apache/spark/internal/config/package.scala

attilapiros · 2018-11-06T09:54:44Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

      truncate: Int = 20,
      vertical: Boolean = false): String = {
-    val numRows = _numRows.max(0).min(Int.MaxValue - 1)
+    val numRows = _numRows.max(0).min(ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH - 1)


Is the "- 1" really necessary after migrating form Int.MaxValue - 1 to ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH - 1?

I think it is -- we make a Seq potentially one larger than this value here:

spark/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

Line 294 in 361bf02

val rows = tmpRows.take(numRows + 1)

(admittedly its a stretch, you shouldn't be showing 2B rows anyway)

SparkQA · 2018-11-06T12:19:42Z

Test build #4413 has finished for PR 22818 at commit 361bf02.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2018-11-06T13:55:15Z

thanks for the review @attilapiros, fixed the style issue, I think the other one is OK as is.

SparkQA · 2018-11-06T17:53:09Z

Test build #98520 has finished for PR 22818 at commit f42b1a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987

LGTM

kiszk · 2018-11-07T06:33:45Z

LGTM

squito · 2018-11-07T12:19:27Z

merged to master

cloud-fan · 2018-11-07T15:38:43Z

since this is a bug fix, shall we also backport it?

JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail. Closes apache#22818 from squito/SPARK-25827. Authored-by: Imran Rashid <[email protected]> Signed-off-by: Imran Rashid <[email protected]> (cherry picked from commit 8fbc183)

squito · 2018-11-08T14:52:26Z

@cloud-fan oh good point, sorry that was an oversight on my part. Since it was clean I just pushed it directly here: 47a668c

I'll update jira too

squito · 2018-11-08T16:32:03Z

argh, sorry about the mistake, thank you for the fix @cloud-fan

JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail. Closes apache#22818 from squito/SPARK-25827. Authored-by: Imran Rashid <[email protected]> Signed-off-by: Imran Rashid <[email protected]>

JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail. Closes apache#22818 from squito/SPARK-25827. Authored-by: Imran Rashid <[email protected]> Signed-off-by: Imran Rashid <[email protected]> (cherry picked from commit 8fbc183)

[SPARK-25827][CORE] Allocate arrays smaller than Int.MaxValue

64b5ed4

JVMs don't you allocate arrays of length exactly Int.MaxValue, so leave a little extra room.

fix more instances of Int.MaxValue

3d77303

squito changed the title ~~[SPARK-25827][CORE] Allocate arrays smaller than Int.MaxValue~~ [SPARK-25904][CORE] Allocate arrays smaller than Int.MaxValue Oct 31, 2018

Merge branch 'master' into SPARK-25827

ca3efd8

Merge branch 'master' into SPARK-25827

361bf02

attilapiros reviewed Nov 6, 2018

View reviewed changes

fix style

f42b1a8

cloud-fan approved these changes Nov 7, 2018

View reviewed changes

jiangxb1987 approved these changes Nov 7, 2018

View reviewed changes

asfgit closed this in 8fbc183 Nov 7, 2018

squito mentioned this pull request Nov 8, 2018

[SPARK-25904][CORE] Allocate arrays smaller than Int.MaxValue #22983

Closed

[SPARK-25904][CORE] Allocate arrays smaller than Int.MaxValue #22818

[SPARK-25904][CORE] Allocate arrays smaller than Int.MaxValue #22818

Uh oh!

Conversation

squito commented Oct 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squito commented Oct 24, 2018

Uh oh!

vanzin commented Oct 24, 2018

Uh oh!

wypoon commented Oct 25, 2018

Uh oh!

SparkQA commented Oct 25, 2018

Uh oh!

kiszk commented Oct 25, 2018

Uh oh!

squito commented Oct 25, 2018

Uh oh!

kiszk commented Oct 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 30, 2018

Uh oh!

squito commented Oct 31, 2018

Uh oh!

squito commented Oct 31, 2018

Uh oh!

SparkQA commented Nov 2, 2018

Uh oh!

squito commented Nov 2, 2018

Uh oh!

kiszk commented Nov 4, 2018

Uh oh!

SparkQA commented Nov 4, 2018

Uh oh!

kiszk commented Nov 4, 2018

Uh oh!

SparkQA commented Nov 4, 2018

Uh oh!

SparkQA commented Nov 6, 2018

Uh oh!

Uh oh!

attilapiros Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

squito Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 6, 2018

Uh oh!

squito commented Nov 6, 2018

Uh oh!

SparkQA commented Nov 6, 2018

Uh oh!

jiangxb1987 left a comment

Choose a reason for hiding this comment

Uh oh!

kiszk commented Nov 7, 2018

Uh oh!

squito commented Nov 7, 2018

Uh oh!

cloud-fan commented Nov 7, 2018

Uh oh!

squito commented Nov 8, 2018

Uh oh!

squito commented Nov 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

squito commented Oct 24, 2018 •

edited

Loading

kiszk commented Oct 28, 2018 •

edited

Loading