[SPARK-14363] Fix executor OOM due to memory leak in the Sorter #12285

sitalkedia · 2016-04-10T08:09:41Z

What changes were proposed in this pull request?

Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization.
This is a regression partially introduced in PR #9241

How was this patch tested?

Tested by running a job and observed around 30% speedup after this change.

sitalkedia · 2016-04-10T08:37:04Z

cc @davies

andrewor14 · 2016-04-11T22:33:46Z

ok to test

SparkQA · 2016-04-12T00:27:00Z

Test build #55544 has finished for PR 12285 at commit 707c9bc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sitalkedia · 2016-04-12T00:48:20Z

@andrewor14 - Thanks for taking a look. Handled the test case failures.

SparkQA · 2016-04-12T00:49:29Z

Test build #55561 has finished for PR 12285 at commit c318a35.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

sitalkedia · 2016-04-12T00:52:10Z

@andrewor14 - Seems like some transient Jenkins failure. Can we rerun the test?

davies · 2016-04-12T02:27:08Z

@sitalkedia I think this is not a memory leak, it just does not release the memory as soon as possible. What does you plan looks like?

sitalkedia · 2016-04-12T04:40:28Z

@davies - Thanks for looking into it. I agree with you that its not a memory leak because that memory may be used later. However, not reducing the pointer array size to the initial size in case of spill is causing heavy memory underutilization because the tasks are not able to get sufficient memory for the storing the records and this situation often lead to the executor OOM. Also, I don't see any reason why would we want to keep the bloated pointer array if we are spilling all data to disk and not have anything to store in the pointer array. This change is restoring the behavior of the sorter before the PR #9241 in https://github.com/apache/spark/pull/9241/files#diff-3eedc75de4787b842477138d8cc7f150L321.

The physical plan looks something like this -

== Physical Plan ==
SortBasedAggregate(key=[shard_id#7L,id#11L,target#9,target_id#12L], functions=[(hiveudaffunction(HiveFunctionWrapper(UDAFCollectMap@270df931),feature_id#10,feature_value#13,false,0,0),mode=Complete,isDistinct=false)], output=[shard_id#7L,id#11L,target#9,target_id#12L,feature_map#14])
+- ConvertToSafe
   +- Sort [shard_id#7L ASC,id#11L ASC,target#9 ASC,target_id#12L ASC], false, 0
      +- TungstenExchange hashpartitioning(shard_id#7L,321), None
         +- Project [(id#11L % 321) AS shard_id#7L,id#11L,target#9,target_id#12L,feature_id#10,feature_value#13]
            +- Filter ((((id#11L > 0) && (target_id#12L > 0)) && NOT (id#11L = target_id#12L)) && (cast(feature_value#13 as double) > 0.001))
               +- HiveTableScan [id#11L,feature_value#13,feature_id#10,target#9,target_id#12L], MetastoreRelation x, y, None, [(ds#8 = 2016-03-20),target#9 IN (1,2,3),feature_id#10 INSET (1, 2)]

davies · 2016-04-12T05:14:50Z

In your case, inside sorting, the key has 4 columns, the row has 6 columns, so each pair will need about 90 bytes, the array used by sort needs 16 bytes, so the memory used by array should be 15% of all memory used by execution. In worse case, free the array finally could waste about 15% of the memory, how can it make that big difference?

If your data set is huge, which requires spill multiple times, the size of spilled data should be similar each time, so the required array should be similar. If we free that finally, we don't need to grow the array in the middle or two spills (the grow require 50% more memory for array), that's the reason I changed to free the array finally.

The reason your job will OOM is that the memory used by Hive UDAF UDAFCollectMap is not managed by Spark, the better solution could be reduce the memory faction for Spark to leave more memory for UDAFCollectMap. After this patch, you may still see OOM, if UDAFCollectMap use even more memory.

I agreed that the current patch is good (try to free memory as much as it can). Just try to understand more, please correct me if something is wrong.

sitalkedia · 2016-04-12T05:43:34Z

@davies Thanks for the explanation, your calculation makes sense. You are right that freeing the array can only make a difference of 15% in ideal case. But what we are experiencing is something different.

Consider the following scenario - We have total shuffle memory of 10G available for 5 tasks. So in ideal situation, each tasks should be assigned 2G of shuffle memory each. And out of those 2G, 300MB should be allocated to the pointer array and rest for storing the records. Now lets say 3 of the tasks finish at the same time and before the driver could run additional tasks on the executor, rest 2 running task aggressively expend their memory and take upto 5G of shuffle memory on the executor, resulting in the pointer array size of around 750MB. Now when the driver runs additional 3 tasks on the executor, previous 2 tasks will be forced to spill, but the pointer array of size 750MB is never freed. This will result in heavy underutilization of memory for the task and in cases where the pointer array actually grew more than the fair share memory of the task, it will result in executor OOM and causing all other tasks on the executor to die.

The job we are running is processing a huge data set of size more than 50TB and we were seeing more than 5% task failure due to OOM. After this fix, we are the failure rate has come down to less than 0.01% and we gain a massive 30% cpu speedup.

davies · 2016-04-12T05:51:04Z

That make sense, thanks for the explanation.

davies · 2016-04-12T05:51:33Z

core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java

I think we can still call it reset, right?

Or call it shrinkMemory() and return the size of freed memory?

good idea, will do.

SparkQA · 2016-04-12T07:52:54Z

Test build #2777 has finished for PR 12285 at commit c318a35.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sitalkedia · 2016-04-12T17:49:27Z

@davies - Thanks for the review. I have addressed all the comments, please let me know how it looks.

SparkQA · 2016-04-12T17:50:57Z

Test build #55626 has finished for PR 12285 at commit 75a44f9.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-12T20:18:04Z

Test build #55627 has finished for PR 12285 at commit b102c25.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-04-12T20:25:31Z

core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java

I'm sorry that I misread this last night, this is spillSize (the number of bytes written into disk), not the amount of freed memory, so we don't need to add the amount from inMemSorter.

Sorry again.

davies · 2016-04-12T20:27:39Z

@sitalkedia Sorry for the trouble.

sitalkedia · 2016-04-12T20:47:07Z

@davies - no issues, I will change it back.

davies · 2016-04-12T21:07:01Z

core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java


    writeSortedFile(false);
    final long spillSize = freeMemory();
+    inMemSorter.reset();


Do we really need to move this call?

Yes, we need to reset the pointer array only after freeing up the memory pages holding records. Otherwise it might happen that the task might not get memory for the pointer array if it is already holding a lot of memory.

davies · 2016-04-12T21:07:32Z

core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java

    writeSortedFile(false);
    final long spillSize = freeMemory();
+    inMemSorter.reset();
+    // Reset the in-memory sorter's pointer array only after freeing up the memory pages holding the


We can move this comment into reset()

IMO, comment in ShuffleExternalSorter makes it easier to get the context and understand. Also in future, if some one tries to move this call, he will not do so, seeing the comment. If the comment is in the reset() function, someone might inadvertently move this call without seeing the comment in reset() function. However, if you have a strong opinion about it, I would gladly move the comment into reset(). Let me know what you think.

OK, I don't have strong opinion on it.

davies · 2016-04-12T21:42:52Z

LGTM, will merging once passing the tests. Thanks for working on it.

sitalkedia · 2016-04-12T21:43:56Z

Thanks a lot for your quick review and response :).

SparkQA · 2016-04-12T22:54:08Z

Test build #55646 has finished for PR 12285 at commit d89adf8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization. This is a regression partially introduced in PR #9241 Tested by running a job and observed around 30% speedup after this change. Author: Sital Kedia <[email protected]> Closes #12285 from sitalkedia/executor_oom. (cherry picked from commit d187e7d) Signed-off-by: Davies Liu <[email protected]> Conflicts: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java

davies · 2016-04-12T23:13:29Z

Merged into master and 1.6 branch (fixed the conflicts)

Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization. This is a regression partially introduced in PR apache#9241 Tested by running a job and observed around 30% speedup after this change. Author: Sital Kedia <[email protected]> Closes apache#12285 from sitalkedia/executor_oom. (cherry picked from commit d187e7d) Signed-off-by: Davies Liu <[email protected]> Conflicts: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java (cherry picked from commit 413d060)

Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization. This is a regression partially introduced in PR apache#9241 Tested by running a job and observed around 30% speedup after this change. Author: Sital Kedia <[email protected]> Closes apache#12285 from sitalkedia/executor_oom. (cherry picked from commit d187e7d) Signed-off-by: Davies Liu <[email protected]> Conflicts: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java

Fix memory leak in the Sorter. When the UnsafeExternalSorter spills the data to disk, it does not free up the underlying pointer array. As a result, we see a lot of executor OOM and also memory under utilization. This is a regression partially introduced in PR apache#9241 Tested by running a job and observed around 30% speedup after this change. Author: Sital Kedia <[email protected]> Closes apache#12285 from sitalkedia/executor_oom.

notflorian · 2016-08-11T10:16:48Z

Sorry to camp this old issue.

I have a similar issue : with spark 1.6.1 in scala, I've a lot of executor OOM when I try to write the content of a RDD into multiple gziped files in hadoop:

rdd.saveAsHadoopFile(path, 
    classOf[String], 
    classOf[String], 
    classOf[RDDMultipleTextOutputFormat], 
    classOf[GzipCodec])

class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[String, String] {
  override def generateFileNameForKeyValue(key: String, value: String, name: String) = key + name

  override def generateActualKey(key: String, value: String) = null
}

It worked fine when I tried to do a rdd.saveAsTextFile, or a rdd.saveAsHadoopFile without the GzipCodec.

Do you think the root cause of my issue could also be this memory leak in the Sorter?

Thanks a lot for your help.

sitalkedia force-pushed the executor_oom branch 2 times, most recently from 4be2021 to 707c9bc Compare April 10, 2016 08:34

sitalkedia force-pushed the executor_oom branch from 707c9bc to c318a35 Compare April 12, 2016 00:46

davies reviewed Apr 12, 2016
View reviewed changes

sitalkedia force-pushed the executor_oom branch 2 times, most recently from 75a44f9 to b102c25 Compare April 12, 2016 17:47

davies reviewed Apr 12, 2016
View reviewed changes

[SPARK-14363] Fix executor OOM due to memory leak in the Sorter

d89adf8

sitalkedia force-pushed the executor_oom branch from b102c25 to d89adf8 Compare April 12, 2016 20:56

davies reviewed Apr 12, 2016
View reviewed changes

asfgit closed this in d187e7d Apr 12, 2016

[SPARK-14363] Fix executor OOM due to memory leak in the Sorter #12285

[SPARK-14363] Fix executor OOM due to memory leak in the Sorter #12285

Uh oh!

Conversation

sitalkedia commented Apr 10, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sitalkedia commented Apr 10, 2016

Uh oh!

andrewor14 commented Apr 11, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

sitalkedia commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

sitalkedia commented Apr 12, 2016

Uh oh!

davies commented Apr 12, 2016

Uh oh!

sitalkedia commented Apr 12, 2016

Uh oh!

davies commented Apr 12, 2016

Uh oh!

sitalkedia commented Apr 12, 2016

Uh oh!

davies commented Apr 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

sitalkedia commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davies commented Apr 12, 2016

Uh oh!

sitalkedia commented Apr 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davies commented Apr 12, 2016

Uh oh!

sitalkedia commented Apr 12, 2016

Uh oh!

SparkQA commented Apr 12, 2016

Uh oh!

davies commented Apr 12, 2016

Uh oh!

notflorian commented Aug 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

notflorian commented Aug 11, 2016 •

edited

Loading