[SPARK-22987][Core] UnsafeExternalSorter cases OOM when invoking `getIterator` function. #20184

liutang123 · 2018-01-08T04:36:47Z

What changes were proposed in this pull request?

ChainedIterator.UnsafeExternalSorter remains a Queue of UnsafeSorterIterator. When call getIterator function of UnsafeExternalSorter, UnsafeExternalSorter passes an ArrayList of UnsafeSorterSpillReader to the constructor of UnsafeExternalSorter. But, UnsafeSorterSpillReader maintains a byte array as buffer, witch capacity is more than 1 MB. When spilling frequently, this case maybe causes OOM.

In this PR, I try to change buffer allocation in UnsafeSorterSpillReader lazily.

How was this patch tested?

Existing tests.

jerryshao · 2018-01-12T05:42:44Z

@liutang123 , can you please tell us how to produce your issue easily?

liutang123 · 2018-01-12T09:11:06Z

Hi, @jerryshao , we can produce this issue as follows:

$ bin/spark-shell --master local --conf spark.sql.windowExec.buffer.spill.threshold=1 --driver-memory 1G 
scala>sc.range(1, 2000).toDF.registerTempTable("test_table")
scala>spark.sql("select row_number() over (partition by 1)  from test_table").collect

This will cause OOM.
The above case is an extreme case.
Normally, the spark.sql.windowExec.buffer.spill.threshold is 4096 by default and the cache used in UnsafeSorterSpillReader is more than 1MB. When the rows in a window is more than 4096000, UnsafeExternalSorter.ChainedIterator will has a queue witch contains 1000 UnsafeSorterSpillReader(s). So, the queue costs a lot of memory and is liable to cause OOM.

jerryshao · 2018-01-14T03:36:42Z

Thanks, let me try to reproduce it locally.

jerryshao · 2018-01-15T08:38:44Z

The code here should be fine for normal case. The problem is that there're so many spill files, which requires to maintain lots of handler's buffer. A lazy buffer allocation could solve this problem, IIUC. It is not related to queue or something else.

liutang123 · 2018-01-15T15:24:06Z

I think that a lazy buffer allocation can not thoroughly solve this problem because UnsafeSorterSpillReader has BufferedFileInputStream witch will allocate off heap memory.

jerryshao · 2018-01-17T05:31:34Z

I think that a lazy buffer allocation can not thoroughly solve this problem because UnsafeSorterSpillReader has BufferedFileInputStream witch will allocate off heap memory.

Can you please explain more. From my understanding the off heap memory in BufferedFileInputStream is the key issue for your scenario here. I don't think the logics you changed in ChainedIterator matters a lot. So a lazy allocation of off-heap memory should be enough IIUC.

…er lazily

liutang123 · 2018-01-17T12:55:10Z

hi, @jerryshao , I try lazily allocate all the InputStream and byte arr in UnsafeSorterSpillReader.
And would you please look at this when you have time?

gatorsmile · 2018-07-29T19:39:07Z

cc @jiangxb1987

kiszk · 2018-08-09T16:19:13Z

core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java

      baseObject = arr;
    }
-    ByteStreams.readFully(in, arr, 0, recordLength);
+    ByteStreams.readFully(getIn(), arr, 0, recordLength);


Is it fine if recordLength is greater than 1024 * 1024?

JoshRosen · 2019-07-14T22:23:41Z

I stumbled across this PR while looking through the open Spark core PRs.

It sounds like the problem here is that we don't need to allocate the input stream and read buffer until it's actually time to read the spill, but we're currently doing that too early:

In getSortedIterator(), we have to construct all readers before we can return the first record because we must find the first record according to the sorted ordering and that requires looking at all spill files.
However, we do not have this constraint in getIterator(), which returns an unsorted iterator and is used in ExternalAppendOnlyUnsafeRowArray (which uses the sorter only for its spilling capabilities, not for sorting). In this case, we can initialize one-at-a-time only once we actually need to read the spill.

Given this context, lazy initialization makes sense to me. However, this PR is a bit outdated and has some merge conflicts. I would be supportive of this change if the conflicts are resolved and the PR description is updated.

AmplabJenkins · 2019-09-16T18:23:31Z

Can one of the admins verify this patch?

github-actions · 2020-01-14T00:05:52Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

[SPARK-22987][Core] Change buffer allocation in UnsafeSorterSpillRead…

db138d1

…er lazily

liutang123 force-pushed the SPARK-22987 branch from b6a8645 to db138d1 Compare January 17, 2018 12:41

kiszk reviewed Aug 9, 2018

View reviewed changes

dongjoon-hyun added the SPARK CORE label Jun 14, 2019

github-actions bot added the Stale label Jan 14, 2020

github-actions bot closed this Jan 15, 2020

JoshRosen mentioned this pull request Jan 23, 2020

[SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements #27246

Closed

[SPARK-22987][Core] UnsafeExternalSorter cases OOM when invoking getIterator function. #20184

[SPARK-22987][Core] UnsafeExternalSorter cases OOM when invoking getIterator function. #20184

Uh oh!

Conversation

liutang123 commented Jan 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

jerryshao commented Jan 12, 2018

Uh oh!

liutang123 commented Jan 12, 2018

Uh oh!

jerryshao commented Jan 14, 2018

Uh oh!

jerryshao commented Jan 15, 2018

Uh oh!

liutang123 commented Jan 15, 2018

Uh oh!

jerryshao commented Jan 17, 2018

Uh oh!

liutang123 commented Jan 17, 2018

Uh oh!

gatorsmile commented Jul 29, 2018

Uh oh!

kiszk Aug 9, 2018

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Jul 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmplabJenkins commented Sep 16, 2019

Uh oh!

github-actions bot commented Jan 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[SPARK-22987][Core] UnsafeExternalSorter cases OOM when invoking `getIterator` function. #20184

[SPARK-22987][Core] UnsafeExternalSorter cases OOM when invoking `getIterator` function. #20184

liutang123 commented Jan 8, 2018 •

edited

Loading

JoshRosen commented Jul 14, 2019 •

edited

Loading