[SPARK-12817] Add BlockManager.getOrElseUpdate and remove CacheManager #11436

JoshRosen · 2016-02-29T22:42:40Z

CacheManager directly calls MemoryStore.unrollSafely() and has its own logic for handling graceful fallback to disk when cached data does not fit in memory. However, this logic also exists inside of the MemoryStore itself, so this appears to be unnecessary duplication.

Thanks to the addition of block-level read/write locks in #10705, we can refactor the code to remove the CacheManager and replace it with an atomic BlockManager.getOrElseUpdate() method.

This pull request replaces / subsumes #10748.

/cc @andrewor14 and @nongli for review. Note that this changes the locking semantics of a couple of internal BlockManager methods (doPut() and lockNewBlockForWriting), so please pay attention to the Scaladoc changes and new test cases for those methods.

Previously these methods would downgrade the exclusive write lock to a shared read lock, but this behavior is only needed in one place (CacheManager) and I'm planning to replace that with a BlockManager getOrElseUpdate method, so it makes sense to make lock downgrading the exception rather than the common case.

…used).

JoshRosen · 2016-02-29T22:43:41Z

core/src/test/scala/org/apache/spark/CacheManagerSuite.scala

-    }.cache()
-  }
-
-  test("get uncached rdd") {


Open question: which test cases do we need to preserve in the new code? Should I test this more end-to-end to make sure that caching still works (i.e. doesn't recompute + stores blocks as side effect), or is that handled at a higher level of the stack in existing tests?

nongli · 2016-03-01T00:36:13Z

much simpler!

SparkQA · 2016-03-01T00:46:16Z

Test build #52200 has finished for PR 11436 at commit 8f6cc09.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-01T02:32:04Z

There's one corner-case which this patch doesn't handle correctly: if we are trying to cache a block, start unrolling the iterator, and then conclude that the iterator won't fit in memory and we can't drop it down to disk, then we need to have a way to keep the values that we've already computed. TL;DR: the memoryStore itself now needs a maybePutAndReturnIterator method in order to preserve the old behavior.

JoshRosen · 2016-03-01T03:32:08Z

Alright, took another shot at addressing the correctness issue. Will address other comments after tests pass.

JoshRosen · 2016-03-01T03:34:40Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

-          if (bytesAfterPut == null) {
-            if (valuesAfterPut == null) {
-              throw new SparkException(
-                "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")


/cc @tdas, the goal of this change is to avoid attempting to replicate deserialized, memory-only blocks if their initial cache / persist fails due to a lack of memory.

For the purposes of this patch, we need to do this to prevent the iterator from being consumed so that it can be passed back to the caller. More generally, though, I think that we should have this change to avoid OOMs by trying to serialize an entire partition which was too large to be stored.

SparkQA · 2016-03-01T05:30:58Z

Test build #52212 has finished for PR 11436 at commit 50f66d1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-01T09:34:23Z

Test build #52226 has finished for PR 11436 at commit f8cccea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-03-01T18:41:10Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

    }
  }

+  /**


Can you add that this is only called on the executors? (otherwise people might wonder why we don't just use sc.env instead of SparkEnv.get)

andrewor14 · 2016-03-01T22:32:26Z

@JoshRosen This looks good, but still somewhat complex. In particular, there are a few unintuitive signatures in BlockManager and related classes that I think can be improved. I made a suggestion for doPut's return type, but I'm still thinking about how to avoid the Either in getOrElseUpdate's return type. I think this task is inherently complex so eventually we'll run out of things to simplify without a significant refactor.

andrewor14 · 2016-03-01T22:37:35Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

+          existingMetrics.incBytesReadInternal(blockResult.bytes)
+          new InterruptibleIterator[T](context, blockResult.data.asInstanceOf[Iterator[T]]) {
+            override def next(): T = {
+              existingMetrics.incRecordsReadInternal(1)


crazy idea: what if we push all of this metrics code into getOrElseUpdate? Then we don't need the complicated left or right signature anymore and we can contain all the complexity within BlockManager itself. We might need to pass in TaskContext but I think it's pretty doable to make getOrElseUpdate return an Iterator[T] instead. Or just do TaskContext.get there; doPut already does that anyway.

I considered this but had some reservations due to the fact that we might not want to consider every read of a BlockManager iterator to be a read of an input record (since we also read from iterators in the implementation of getSingle() and in a couple of other places, and those methods are used for things like TorrentBroadcast blocks which I don't think we want to treat as input records in for the purposes of these metrics).

On the other hand, this getOrElseUpdate method really only makes sense in the context of caching code anyways, so maybe it's fine to couple its semantics a little more tightly to its only current use.

…iting().

andrewor14 · 2016-03-02T02:25:50Z

LGTM. There are still a few clunky interfaces but there's not much we can do about them without doing a bigger refactor, which is outside the scope. I think the main confusing bit is now resolved so this is just waiting for tests.

SparkQA · 2016-03-02T03:01:21Z

Test build #52269 has finished for PR 11436 at commit 5a1fb12.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-02T03:24:37Z

Test build #52271 has finished for PR 11436 at commit 51e1b2e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-02T04:58:39Z

Yeah, I wish there was a straightforward way to simplify this code, but I think that a lot of the remaining complexity is pretty inherent in the current requirements / behaviors. I'm going to merge this in a little bit.

SparkQA · 2016-03-02T07:21:04Z

Test build #52294 has finished for PR 11436 at commit 0aafba2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-03-02T18:26:03Z

Merged into master.

## Motivation As a pre-requisite to off-heap caching of blocks, we need a mechanism to prevent pages / blocks from being evicted while they are being read. With on-heap objects, evicting a block while it is being read merely leads to memory-accounting problems (because we assume that an evicted block is a candidate for garbage-collection, which will not be true during a read), but with off-heap memory this will lead to either data corruption or segmentation faults. ## Changes ### BlockInfoManager and reader/writer locks This patch adds block-level read/write locks to the BlockManager. It introduces a new `BlockInfoManager` component, which is contained within the `BlockManager`, holds the `BlockInfo` objects that the `BlockManager` uses for tracking block metadata, and exposes APIs for locking blocks in either shared read or exclusive write modes. `BlockManager`'s `get*()` and `put*()` methods now implicitly acquire the necessary locks. After a `get()` call successfully retrieves a block, that block is locked in a shared read mode. A `put()` call will block until it acquires an exclusive write lock. If the write succeeds, the write lock will be downgraded to a shared read lock before returning to the caller. This `put()` locking behavior allows us store a block and then immediately turn around and read it without having to worry about it having been evicted between the write and the read, which will allow us to significantly simplify `CacheManager` in the future (see #10748). See `BlockInfoManagerSuite`'s test cases for a more detailed specification of the locking semantics. ### Auto-release of locks at the end of tasks Our locking APIs support explicit release of locks (by calling `unlock()`), but it's not always possible to guarantee that locks will be released prior to the end of the task. One reason for this is our iterator interface: since our iterators don't support an explicit `close()` operator to signal that no more records will be consumed, operations like `take()` or `limit()` don't have a good means to release locks on their input iterators' blocks. Another example is broadcast variables, whose block locks can only be released at the end of the task. To address this, `BlockInfoManager` uses a pair of maps to track the set of locks acquired by each task. Lock acquisitions automatically record the current task attempt id by obtaining it from `TaskContext`. When a task finishes, code in `Executor` calls `BlockInfoManager.unlockAllLocksForTask(taskAttemptId)` to free locks. ### Locking and the MemoryStore In order to prevent in-memory blocks from being evicted while they are being read, the `MemoryStore`'s `evictBlocksToFreeSpace()` method acquires write locks on blocks which it is considering as candidates for eviction. These lock acquisitions are non-blocking, so a block which is being read will not be evicted. By holding write locks until the eviction is performed or skipped (in case evicting the blocks would not free enough memory), we avoid a race where a new reader starts to read a block after the block has been marked as an eviction candidate but before it has been removed. ### Locking and remote block transfer This patch makes small changes to to block transfer and network layer code so that locks acquired by the BlockTransferService are released as soon as block transfer messages are consumed and released by Netty. This builds on top of #11193, a bug fix related to freeing of network layer ManagedBuffers. ## FAQ - **Why not use Java's built-in [`ReadWriteLock`](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html)?** Our locks operate on a per-task rather than per-thread level. Under certain circumstances a task may consist of multiple threads, so using `ReadWriteLock` would mean that we might call `unlock()` from a thread which didn't hold the lock in question, an operation which has undefined semantics. If we could rely on Java 8 classes, we might be able to use [`StampedLock`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/StampedLock.html) to work around this issue. - **Why not detect "leaked" locks in tests?**: See above notes about `take()` and `limit`. Author: Josh Rosen <[email protected]> Closes #10705 from JoshRosen/pin-pages.

CacheManager directly calls MemoryStore.unrollSafely() and has its own logic for handling graceful fallback to disk when cached data does not fit in memory. However, this logic also exists inside of the MemoryStore itself, so this appears to be unnecessary duplication. Thanks to the addition of block-level read/write locks in apache#10705, we can refactor the code to remove the CacheManager and replace it with an atomic `BlockManager.getOrElseUpdate()` method. This pull request replaces / subsumes apache#10748. /cc andrewor14 and nongli for review. Note that this changes the locking semantics of a couple of internal BlockManager methods (`doPut()` and `lockNewBlockForWriting`), so please pay attention to the Scaladoc changes and new test cases for those methods. Author: Josh Rosen <[email protected]> Closes apache#11436 from JoshRosen/remove-cachemanager.

JoshRosen added 6 commits February 29, 2016 11:17

Add getOrCompute() to replace CacheManager usage in RDD.

d6ce63d

Remove CacheManager

e5f505e

Remove BlockStore / BlockManager putArray() method (since it's now un…

2613038

…used).

Inline MemoryStore.putArray() at its only callsite.

0c48c63

Trim some excess whitespace.

8f6cc09

JoshRosen reviewed Feb 29, 2016
View reviewed changes

JoshRosen added 2 commits February 29, 2016 18:10

Fix incorrect release of non-held read lock in TorrentBroadcast.

e75e09b

Invert incorrect readCachedBlock condition.

2262dc3

JoshRosen added 2 commits February 29, 2016 19:17

Fix caching of MEMORY_ONLY blocks which don’t fit.

d0f8685

Only replicate deserialized block if store succeeds.

50f66d1

JoshRosen reviewed Mar 1, 2016
View reviewed changes

Fix bad release in TaskResultGetter.

f8cccea

andrewor14 reviewed Mar 1, 2016
View reviewed changes

Add comment explaining SparkEnv.get call

dcd5f74

JoshRosen added 2 commits March 1, 2016 16:24

Remove BlockInfo.isRemoved and use this to simplify lockNewBlockForWr…

5a1fb12

…iting().

Rename downgradeToReadLock -> keepReadLock

51e1b2e

Add comment to explain that doPut() does not overwrite existing block.

0aafba2

asfgit closed this in d6969ff Mar 2, 2016

JoshRosen deleted the remove-cachemanager branch March 8, 2016 06:18

                   }
                 }
+                /**

[SPARK-12817] Add BlockManager.getOrElseUpdate and remove CacheManager #11436

[SPARK-12817] Add BlockManager.getOrElseUpdate and remove CacheManager #11436

Uh oh!

Conversation

JoshRosen commented Feb 29, 2016

Uh oh!

JoshRosen Feb 29, 2016

Choose a reason for hiding this comment

Uh oh!

nongli commented Mar 1, 2016

Uh oh!

SparkQA commented Mar 1, 2016

Uh oh!

JoshRosen commented Mar 1, 2016

Uh oh!

JoshRosen commented Mar 1, 2016

Uh oh!

JoshRosen Mar 1, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 1, 2016

Uh oh!

SparkQA commented Mar 1, 2016

Uh oh!

andrewor14 Mar 1, 2016

Choose a reason for hiding this comment

Uh oh!

JoshRosen Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Mar 1, 2016

Uh oh!

andrewor14 Mar 1, 2016

Choose a reason for hiding this comment

Uh oh!

JoshRosen Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Mar 2, 2016

Uh oh!

SparkQA commented Mar 2, 2016

Uh oh!

SparkQA commented Mar 2, 2016

Uh oh!

JoshRosen commented Mar 2, 2016

Uh oh!

SparkQA commented Mar 2, 2016

Uh oh!

andrewor14 commented Mar 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants