Skip to content

Commit 93c14c6

Browse files
Venkata krishnan Sowrirajancloud-fan
authored andcommitted
[SPARK-27338][CORE] Fix deadlock in UnsafeExternalSorter.SpillableIterator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager
## What changes were proposed in this pull request? In `UnsafeExternalSorter.SpillableIterator#loadNext()` takes lock on the `UnsafeExternalSorter` and calls `freePage` once the `lastPage` is consumed which needs to take a lock on `TaskMemoryManager`. At the same time, there can be another MemoryConsumer using `UnsafeExternalSorter` as part of sorting can try to `allocatePage` needs to get lock on `TaskMemoryManager` which can cause spill to happen which requires lock on `UnsafeExternalSorter` again causing deadlock. This is a classic deadlock situation happening similar to the SPARK-26265. To fix this, we can move the `freePage` call in `loadNext` outside of `Synchronized` block similar to the fix in SPARK-26265 ## How was this patch tested? Manual tests were being done and will also try to add a test. Closes #24265 from venkata91/deadlock-sorter. Authored-by: Venkata krishnan Sowrirajan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6c4552c) Signed-off-by: Wenchen Fan <[email protected]>
1 parent ed3ffda commit 93c14c6

File tree

1 file changed

+23
-11
lines changed

1 file changed

+23
-11
lines changed

core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -575,19 +575,31 @@ public boolean hasNext() {
575575

576576
@Override
577577
public void loadNext() throws IOException {
578-
synchronized (this) {
579-
loaded = true;
580-
if (nextUpstream != null) {
581-
// Just consumed the last record from in memory iterator
582-
if (lastPage != null) {
583-
freePage(lastPage);
584-
lastPage = null;
578+
MemoryBlock pageToFree = null;
579+
try {
580+
synchronized (this) {
581+
loaded = true;
582+
if (nextUpstream != null) {
583+
// Just consumed the last record from in memory iterator
584+
if(lastPage != null) {
585+
// Do not free the page here, while we are locking `SpillableIterator`. The `freePage`
586+
// method locks the `TaskMemoryManager`, and it's a bad idea to lock 2 objects in
587+
// sequence. We may hit dead lock if another thread locks `TaskMemoryManager` and
588+
// `SpillableIterator` in sequence, which may happen in
589+
// `TaskMemoryManager.acquireExecutionMemory`.
590+
pageToFree = lastPage;
591+
lastPage = null;
592+
}
593+
upstream = nextUpstream;
594+
nextUpstream = null;
585595
}
586-
upstream = nextUpstream;
587-
nextUpstream = null;
596+
numRecords--;
597+
upstream.loadNext();
598+
}
599+
} finally {
600+
if (pageToFree != null) {
601+
freePage(pageToFree);
588602
}
589-
numRecords--;
590-
upstream.loadNext();
591603
}
592604
}
593605

0 commit comments

Comments
 (0)