Skip to content

Conversation

@squito
Copy link
Contributor

@squito squito commented Apr 13, 2016

What changes were proposed in this pull request?

This adds a new configuration, spark.shuffle.spillAfterRead, which can be used to force Spillables to spill their contents after all records have been inserted. The default is false, to keep previous behavior and avoid a performance penalty when unnecessary. However this needed in cases to prevent an OOM when one Spillable acquires all of the execution memory available for a task, thus leaving no memory available for any other operations in the same task.

This also required some small refactoring of Spillable to support a forced spill from an external request (as opposed to not having enough memory as records are added).

I was initially hoping to limit the places where we needed to spill -- I thought that it would only be in a ShuffleMapTask which also does a shuffle-read. In that case there is clearly a Spillable on both the shuffle-read and shuffle-write side. However, I realized this wasn't sufficient -- there are other cases when you can have multiple Spillables, eg if you use a common partitioner across several aggregations, which will get pipelined into one stage.

This also makes Spillables register themselves as MemoryConsumers with the TaskMemoryManager. Note that this does not lead to cooperative memory management for Spillables -- the only reason for this is to improve the logging around memory usage. Before this change, there would be messages like:

INFO memory.TaskMemoryManager: 217903352 bytes of memory were used by task 2109 but are not associated with specific consumers

instead, with this change the logs report the memory as being associated with the corresponing Spillable

How was this patch tested?

  • unit tests were added to reproduce the original problem
  • jenkins unit tests
  • also independently ran some large workloads which were consistently getting an OOM before this change, which now pass

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55742 has finished for PR 12369 at commit b74f215.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55743 has finished for PR 12369 at commit 241f712.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor Author

squito commented Apr 14, 2016

cc @JoshRosen @davies , i think you have looked at related parts of memory management recently

@davies
Copy link
Contributor

davies commented Apr 15, 2016

�LGTM over all, @JoshRosen do you want to take another look?

@lianhuiwang
Copy link
Contributor

I think you can take a look at #10024 . It can force ExternalAppendOnlyMap or ExternalSorter to spill memory when there is no enough memory. Thanks.

@squito
Copy link
Contributor Author

squito commented Apr 18, 2016

Thanks for the review @davies. I actually just found out about another workload failure even with this patch -- not sure if its related or not, but I'll look into today and will update, so lets hold off on merging for the moment in any case.

@lianhuiwang sorry I wasn't aware of your PR earlier. I totally agree that it makes more sense to have Spillables actually spill when requested by a MemoryConsumer. However as that change is more invasive and risky, I still think it makes sense to put this in for now as a safe, easy fix for now. If that later patch completely removes the need for this, that would be fantastic. I'll take a look at your patch as well.

@SparkQA
Copy link

SparkQA commented Apr 18, 2016

Test build #56107 has finished for PR 12369 at commit a3c1c12.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Apr 18, 2016

@squito I'd prefer to wait for #10024.

@squito squito closed this Apr 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants