Skip to content

Conversation

@squito
Copy link
Contributor

@squito squito commented Oct 24, 2018

JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big. This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail.

JVMs don't you allocate arrays of length exactly Int.MaxValue, so leave
a little extra room.
@squito
Copy link
Contributor Author

squito commented Oct 24, 2018

@kiszk

@vanzin
Copy link
Contributor

vanzin commented Oct 24, 2018

JVMs don't you allocate arrays

You grammar there.

@wypoon
Copy link
Contributor

wypoon commented Oct 25, 2018

Looks good to me. I reran the test that encountered this issue on a secure cluster after deploying a build with this change and now it passes.

@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Test build #97987 has finished for PR 22818 at commit 64b5ed4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Oct 25, 2018

Thanks, would it be also possible to double-check Integer.MAX_VALUE if you have not checked yet?

@squito
Copy link
Contributor Author

squito commented Oct 25, 2018

Actually there are quite a few more uses, even of Int.MaxValue, which I find suspicious, but for the moment I only wanted to touch the cases I understood better. For example, "spark.sql.sortMergeJoinExec.buffer.in.memory.threshold" is used as the max size for an ArrayBuffer in ExternalAppendOnlyUnsafeRowArray, and I'm pretty sure that will cause the same problems:

> scala -J-Xmx16G
scala> val x = new scala.collection.mutable.ArrayBuffer[Int](128)
scala> x.sizeHint(Int.MaxValue)
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
  at scala.collection.mutable.ArrayBuffer.sizeHint(ArrayBuffer.scala:69)
  ... 30 elided

do you think its important to tackle them all here? I could also open another jira to do an audit

@kiszk
Copy link
Member

kiszk commented Oct 28, 2018

Since this PR is not a blocker for 2.4, it is not necessary to close this ASAP.
Thus, I think that it would be good to address these issues as possible.

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Test build #98274 has finished for PR 22818 at commit 3d77303.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor Author

squito commented Oct 31, 2018

@kiszk I've updated this to cover more cases. I didn't cover some of them in mllib-local, as ByteArrayMethods isn't visible there, and it would really only very slightly improve an error msg, so didn't seem worth a refactor here. but dont' feel super strongly about it either.

@squito squito changed the title [SPARK-25827][CORE] Allocate arrays smaller than Int.MaxValue [SPARK-25904][CORE] Allocate arrays smaller than Int.MaxValue Oct 31, 2018
@squito
Copy link
Contributor Author

squito commented Oct 31, 2018

I also changed the issue to SPARK-25904 -- there are some other things related to encryption that it makes more sense to handle under SPARK-25827.

@SparkQA
Copy link

SparkQA commented Nov 2, 2018

Test build #98371 has finished for PR 22818 at commit ca3efd8.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor Author

squito commented Nov 2, 2018

@kiszk
Copy link
Member

kiszk commented Nov 4, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Nov 4, 2018

Test build #98435 has finished for PR 22818 at commit ca3efd8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Nov 4, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Nov 4, 2018

Test build #98451 has finished for PR 22818 at commit ca3efd8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Test build #98504 has finished for PR 22818 at commit 361bf02.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

truncate: Int = 20,
vertical: Boolean = false): String = {
val numRows = _numRows.max(0).min(Int.MaxValue - 1)
val numRows = _numRows.max(0).min(ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "- 1" really necessary after migrating form Int.MaxValue - 1 to ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH - 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is -- we make a Seq potentially one larger than this value here:

val rows = tmpRows.take(numRows + 1)

(admittedly its a stretch, you shouldn't be showing 2B rows anyway)

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Test build #4413 has finished for PR 22818 at commit 361bf02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor Author

squito commented Nov 6, 2018

thanks for the review @attilapiros, fixed the style issue, I think the other one is OK as is.

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Test build #98520 has finished for PR 22818 at commit f42b1a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kiszk
Copy link
Member

kiszk commented Nov 7, 2018

LGTM

@squito
Copy link
Contributor Author

squito commented Nov 7, 2018

merged to master

@asfgit asfgit closed this in 8fbc183 Nov 7, 2018
@cloud-fan
Copy link
Contributor

since this is a bug fix, shall we also backport it?

squito added a commit to squito/spark that referenced this pull request Nov 8, 2018
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big.  This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail.

Closes apache#22818 from squito/SPARK-25827.

Authored-by: Imran Rashid <[email protected]>
Signed-off-by: Imran Rashid <[email protected]>
(cherry picked from commit 8fbc183)
@squito
Copy link
Contributor Author

squito commented Nov 8, 2018

@cloud-fan oh good point, sorry that was an oversight on my part. Since it was clean I just pushed it directly here: 47a668c

I'll update jira too

@squito
Copy link
Contributor Author

squito commented Nov 8, 2018

argh, sorry about the mistake, thank you for the fix @cloud-fan

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big.  This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail.

Closes apache#22818 from squito/SPARK-25827.

Authored-by: Imran Rashid <[email protected]>
Signed-off-by: Imran Rashid <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big.  This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail.

Closes apache#22818 from squito/SPARK-25827.

Authored-by: Imran Rashid <[email protected]>
Signed-off-by: Imran Rashid <[email protected]>
(cherry picked from commit 8fbc183)
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
JVMs can't allocate arrays of length exactly Int.MaxValue, so ensure we never try to allocate an array that big.  This commit changes some defaults & configs to gracefully fallover to something that doesn't require one large array in some cases; in other cases it simply improves an error message for cases which will still fail.

Closes apache#22818 from squito/SPARK-25827.

Authored-by: Imran Rashid <[email protected]>
Signed-off-by: Imran Rashid <[email protected]>
(cherry picked from commit 8fbc183)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants