Skip to content

Conversation

@heary-cao
Copy link
Contributor

What changes were proposed in this pull request?

the pr #20014 which introduced SparkOutOfMemoryError to avoid killing the entire executor when an OutOfMemoryError is thrown.
so apply for memory using MemoryConsumer. allocatePage when catch exception, use SparkOutOfMemoryError instead of OutOfMemoryError

How was this patch tested?

N / A

@heary-cao
Copy link
Contributor Author

cc @cloud-fan @kiszk @dongjoon-hyun

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know what was the behavior before? Will we propagate the exception all the way up and kill the executor?

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

add to whitelist

@cloud-fan
Copy link
Contributor

good catch! thanks!

@kiszk
Copy link
Member

kiszk commented Nov 19, 2018

I think that we need to take care of UnsafeExternalSorterSuite.testGetIterator and UnsafeInMemorySorterSuite.freeAfterOOM, too.

@SparkQA
Copy link

SparkQA commented Nov 19, 2018

Test build #99006 has finished for PR 23084 at commit 7345942.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 19, 2018

Test build #99002 has finished for PR 23084 at commit 7345942.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao heary-cao force-pushed the SparkOutOfMemoryError branch from 7345942 to 400bcd5 Compare November 20, 2018 04:21
@heary-cao
Copy link
Contributor Author

@cloud-fan, I think not kill executor. the memory allocate fails, and the caller is notified to perform other operations. for example, HashAggregateExec will carry out spill the map and fallback to sort-based.
please correct if i understand something wrong. thanks.

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99045 has finished for PR 23084 at commit 400bcd5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@heary-cao
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 20, 2018

Test build #99046 has finished for PR 23084 at commit 400bcd5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

for example, HashAggregateExec will carry out spill the map and fallback to sort-based

Do you mean this patch change nothing for this case?

@heary-cao
Copy link
Contributor Author

I think growAndRehash always thrown SparkOutOfMemoryError. catch SparkOutOfMemoryError or OutOfMemoryError will make canGrowArray = false. please correct if i understand something wrong. thanks.

@heary-cao
Copy link
Contributor Author

so this PR is just code cleanup, not a real bug. thanks.

@cloud-fan
Copy link
Contributor

I think it's safer to only catch the spark-thrown OOM, not the system OOM, so LGTM

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 466d011 Nov 23, 2018
@heary-cao
Copy link
Contributor Author

@cloud-fan,thanks

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…moryError when catch exception

## What changes were proposed in this pull request?

the pr apache#20014 which introduced `SparkOutOfMemoryError` to avoid killing the entire executor when an `OutOfMemoryError `is thrown.
so apply for memory using `MemoryConsumer. allocatePage `when  catch exception, use `SparkOutOfMemoryError `instead of `OutOfMemoryError`

## How was this patch tested?
N / A

Closes apache#23084 from heary-cao/SparkOutOfMemoryError.

Authored-by: caoxuewen <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants