Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This patch adds support for caching blocks in the executor processes using direct / off-heap memory.

User-facing changes

Updated semantics of OFF_HEAP storage level: In Spark 1.x, the OFF_HEAP storage level indicated that an RDD should be cached in Tachyon. Spark 2.x removed the external block store API that Tachyon caching was based on (see #10752 / SPARK-12667), so OFF_HEAP became an alias for MEMORY_ONLY_SER. As of this patch, OFF_HEAP means "serialized and cached in off-heap memory or on disk". Via the StorageLevel constructor, useOffHeap can be set if serialized == true and can be used to construct custom storage levels which support replication.

Storage UI reporting: the storage UI will now report whether in-memory blocks are stored on- or off-heap.

Only supported by UnifiedMemoryManager: for simplicity, this feature is only supported when the default UnifiedMemoryManager is used; applications which use the legacy memory manager (spark.memory.useLegacyMode=true) are not currently able to allocate off-heap storage memory, so using off-heap caching will fail with an error when legacy memory management is enabled. Given that we plan to eventually remove the legacy memory manager, this is not a significant restriction.

Memory management policies: the policies for dividing available memory between execution and storage are the same for both on- and off-heap memory. For off-heap memory, the total amount of memory available for use by Spark is controlled by spark.memory.offHeap.size, which is an absolute size. Off-heap storage memory obeys spark.memory.storageFraction in order to control the amount of unevictable storage memory. For example, if spark.memory.offHeap.size is 1 gigabyte and Spark uses the default storageFraction of 0.5, then up to 500 megabytes of off-heap cached blocks will be protected from eviction due to execution memory pressure. If necessary, we can split spark.memory.storageFraction into separate on- and off-heap configurations, but this doesn't seem necessary now and can be done later without any breaking changes.

Use of off-heap memory does not imply use of off-heap execution (or vice-versa): for now, the settings controlling the use of off-heap execution memory (spark.memory.offHeap.enabled) and off-heap caching are completely independent, so Spark SQL can be configured to use off-heap memory for execution while continuing to cache blocks on-heap. If desired, we can change this in a followup patch so that spark.memory.offHeap.enabled affect the default storage level for cached SQL tables.

Internal changes

  • Rename ByteArrayChunkOutputStream to ChunkedByteBufferOutputStream
    • It now returns a ChunkedByteBuffer instead of an array of byte arrays.
    • Its constructor now accept an allocator function which is called to allocate ByteBuffers. This allows us to control whether it allocates regular ByteBuffers or off-heap DirectByteBuffers.
    • Because block serialization is now performed during the unroll process, a ChunkedByteBufferOutputStream which is configured with a DirectByteBuffer allocator will use off-heap memory for both unroll and storage memory.
  • The MemoryStore's MemoryEntries now tracks whether blocks are stored on- or off-heap.
    • evictBlocksToFreeSpace() now accepts a MemoryMode parameter so that we don't try to evict off-heap blocks in response to on-heap memory pressure (or vice-versa).
  • Make sure that off-heap buffers are properly de-allocated during MemoryStore eviction.
  • The JVM limits the total size of allocated direct byte buffers using the -XX:MaxDirectMemorySize flag and the default tends to be fairly low (< 512 megabytes in some JVMs). To work around this limitation, this patch adds a custom DirectByteBuffer allocator which ignores this memory limit.

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53483 has finished for PR 11805 at commit feffc20.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53819 has finished for PR 11805 at commit 222d80b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53825 has finished for PR 11805 at commit bf62983.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54096 has finished for PR 11805 at commit 5a15de2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #54125 has finished for PR 11805 at commit 3c122af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #54215 has finished for PR 11805 at commit 334b727.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

/cc @andrewor14, @nongli, @rxin, can you glance over the PR description to check whether the user-facing semantics are okay for now? We might want to change some bits later (maybe to add separate nice shorthands for off-heap-memory-only vs off-heap-memory-and-disk), but I'd like to understand whether any semantics need to change here.

@rxin
Copy link
Contributor

rxin commented Mar 26, 2016

Looks pretty good actually. There are some details about naming and on/off defaults we need to figure out, but that can go in later.

@SparkQA
Copy link

SparkQA commented Mar 26, 2016

Test build #54239 has finished for PR 11805 at commit 4d9489a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Mar 26, 2016
…ryManager

This patch extends Spark's `UnifiedMemoryManager` to add bookkeeping support for off-heap storage memory, an requirement for enabling off-heap caching (which will be done by #11805). The `MemoryManager`'s `storageMemoryPool` has been split into separate on- and off-heap pools and the storage and unroll memory allocation methods have been updated to accept a `memoryMode` parameter to specify whether allocations should be performed on- or off-heap.

In order to reduce the testing surface, the `StaticMemoryManager` does not support off-heap caching (we plan to eventually remove the `StaticMemoryManager`, so this isn't a significant limitation).

Author: Josh Rosen <[email protected]>

Closes #11942 from JoshRosen/off-heap-storage-memory-bookkeeping.
@SparkQA
Copy link

SparkQA commented Mar 27, 2016

Test build #54272 has finished for PR 11805 at commit df8be62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Discovered a potentially major problem: if we bail out of a getOrElseUpdate call and don't end up caching a block and then do not fully consume or dispose of the resulting valuesIterator then we might leak off-heap memory. This problem didn't exist before because we'd never use off-heap memory for unroll space. I may have to add some task completion callbacks to dispose of PartiallySerializedBlocks.

@JoshRosen
Copy link
Contributor Author

Alright, this should now be ready for an initial review pass. I just pushed a somewhat hacky patch to work around the direct buffer allocation limit (42d0356) and added a few simple tests. I also addressed the potential leak of direct memory in cases where partially serialized blocks' iterators aren't fully consumed.

I'm sure that this probably needs a bit more work and I'll do more self-review tomorrow once I've had time to set this aside for a bit.

@JoshRosen
Copy link
Contributor Author

/cc @andrewor14 @davies for review

package org.apache.spark.streaming.rdd

import java.io.File
import java.nio.ByteBuffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used

@andrewor14
Copy link
Contributor

@JoshRosen Overall looks good. I have not verified comprehensively whether all the things are disposed and released correctly, but it looks OK from what I can tell. If there are actually leaks I'm sure we'll find them quickly.

val transfer = transferService
.getOrElse(new NettyBlockTransferService(conf, securityMgr, numCores = 1))
val memManager = new StaticMemoryManager(conf, Long.MaxValue, maxMem, numCores = 1)
val memManager = UnifiedMemoryManager(conf, numCores = 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!

@andrewor14
Copy link
Contributor

Forgot to mention, you'll need to update documentation for spark.memory.storageFraction. Right now its description is pretty specific to the on-heap stuff.

* and then serializing the values from the original input iterator.
*/
def finishWritingToStream(os: OutputStream): Unit = {
ByteStreams.copy(unrolled.toInputStream(), os)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a bug here: we aren't guaranteed to free the actual offHeap memory here. I think that in my benchmarks it just so happened to be cleaned by the sun.misc.Cleaner that I attached to the DirectByteBuffer in my custom allocator. I'm going to take a closer look at this block to try to fix the leak issues here.

@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54678 has finished for PR 11805 at commit 61920a9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54681 has finished for PR 11805 at commit fde020f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54682 has finished for PR 11805 at commit a604078.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

deep-review this please

@DeepSparkBot
Copy link

@andrewor14 Review complete. In the future we should add tests to ensure buffers are released. @JoshRosen please file a JIRA for this issue.

LGTM.

@JoshRosen
Copy link
Contributor Author

Alright, I'm going to merge this into master and will file followup JIRAs to make sure that we update the user-facing documentation and add more tests before 2.0. There are a number of questions to answer about how best to expose the relevant off-heap configurations to users, but I'd rather address them separately.

@asfgit asfgit closed this in e41acb7 Apr 1, 2016
@JoshRosen JoshRosen deleted the off-heap-caching branch April 1, 2016 21:37
@JoshRosen
Copy link
Contributor Author

Filed https://issues.apache.org/jira/browse/SPARK-14336 as followup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants