-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27338][Core] Fix deadlock in UnsafeExternalSorter.SpillableIterator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager #24265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I checked the code and found |
| if (lastPage != null) { | ||
| freePage(lastPage); | ||
| lastPage = null; | ||
| MemoryBlock pageToFree = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your PR seems good!
| if (nextUpstream != null) { | ||
| // Just consumed the last record from in memory iterator | ||
| if(lastPage != null) { | ||
| pageToFree = lastPage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some comment to explain it? You can just copy it from https://github.com/apache/spark/pull/24269/files#diff-027299fb14327ddcaba457f81ecff32cR583
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure @cloud-fan, will add and update the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done @cloud-fan . I didn't know that you also filed a PR simultaneously for the same issue. :) Thanks for graciously closing your PR and accepting this PR.
attilapiros
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for adding some explanation (the comment mentioned by @cloud-fan) otherwise LGTM
|
ok to test |
jiangxb1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please update the comment.
|
Test build #104215 has finished for PR 24265 at commit
|
|
Test build #104255 has finished for PR 24265 at commit
|
…rator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager ## What changes were proposed in this pull request? In `UnsafeExternalSorter.SpillableIterator#loadNext()` takes lock on the `UnsafeExternalSorter` and calls `freePage` once the `lastPage` is consumed which needs to take a lock on `TaskMemoryManager`. At the same time, there can be another MemoryConsumer using `UnsafeExternalSorter` as part of sorting can try to `allocatePage` needs to get lock on `TaskMemoryManager` which can cause spill to happen which requires lock on `UnsafeExternalSorter` again causing deadlock. This is a classic deadlock situation happening similar to the SPARK-26265. To fix this, we can move the `freePage` call in `loadNext` outside of `Synchronized` block similar to the fix in SPARK-26265 ## How was this patch tested? Manual tests were being done and will also try to add a test. Closes #24265 from venkata91/deadlock-sorter. Authored-by: Venkata krishnan Sowrirajan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6c4552c) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/2.4/2.3! |
…rator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager ## What changes were proposed in this pull request? In `UnsafeExternalSorter.SpillableIterator#loadNext()` takes lock on the `UnsafeExternalSorter` and calls `freePage` once the `lastPage` is consumed which needs to take a lock on `TaskMemoryManager`. At the same time, there can be another MemoryConsumer using `UnsafeExternalSorter` as part of sorting can try to `allocatePage` needs to get lock on `TaskMemoryManager` which can cause spill to happen which requires lock on `UnsafeExternalSorter` again causing deadlock. This is a classic deadlock situation happening similar to the SPARK-26265. To fix this, we can move the `freePage` call in `loadNext` outside of `Synchronized` block similar to the fix in SPARK-26265 ## How was this patch tested? Manual tests were being done and will also try to add a test. Closes #24265 from venkata91/deadlock-sorter. Authored-by: Venkata krishnan Sowrirajan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6c4552c) Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? #24265 breaks the lint check, because it has trailing space. (not sure why it passed jenkins). This PR fixes it. ## How was this patch tested? N/A Closes #24289 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? #24265 breaks the lint check, because it has trailing space. (not sure why it passed jenkins). This PR fixes it. ## How was this patch tested? N/A Closes #24289 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? #24265 breaks the lint check, because it has trailing space. (not sure why it passed jenkins). This PR fixes it. ## How was this patch tested? N/A Closes #24289 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…rator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager ## What changes were proposed in this pull request? In `UnsafeExternalSorter.SpillableIterator#loadNext()` takes lock on the `UnsafeExternalSorter` and calls `freePage` once the `lastPage` is consumed which needs to take a lock on `TaskMemoryManager`. At the same time, there can be another MemoryConsumer using `UnsafeExternalSorter` as part of sorting can try to `allocatePage` needs to get lock on `TaskMemoryManager` which can cause spill to happen which requires lock on `UnsafeExternalSorter` again causing deadlock. This is a classic deadlock situation happening similar to the SPARK-26265. To fix this, we can move the `freePage` call in `loadNext` outside of `Synchronized` block similar to the fix in SPARK-26265 ## How was this patch tested? Manual tests were being done and will also try to add a test. Closes apache#24265 from venkata91/deadlock-sorter. Authored-by: Venkata krishnan Sowrirajan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6c4552c) Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? apache#24265 breaks the lint check, because it has trailing space. (not sure why it passed jenkins). This PR fixes it. ## How was this patch tested? N/A Closes apache#24289 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…rator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager ## What changes were proposed in this pull request? In `UnsafeExternalSorter.SpillableIterator#loadNext()` takes lock on the `UnsafeExternalSorter` and calls `freePage` once the `lastPage` is consumed which needs to take a lock on `TaskMemoryManager`. At the same time, there can be another MemoryConsumer using `UnsafeExternalSorter` as part of sorting can try to `allocatePage` needs to get lock on `TaskMemoryManager` which can cause spill to happen which requires lock on `UnsafeExternalSorter` again causing deadlock. This is a classic deadlock situation happening similar to the SPARK-26265. To fix this, we can move the `freePage` call in `loadNext` outside of `Synchronized` block similar to the fix in SPARK-26265 ## How was this patch tested? Manual tests were being done and will also try to add a test. Closes apache#24265 from venkata91/deadlock-sorter. Authored-by: Venkata krishnan Sowrirajan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6c4552c) Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? apache#24265 breaks the lint check, because it has trailing space. (not sure why it passed jenkins). This PR fixes it. ## How was this patch tested? N/A Closes apache#24289 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…rator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager ## What changes were proposed in this pull request? In `UnsafeExternalSorter.SpillableIterator#loadNext()` takes lock on the `UnsafeExternalSorter` and calls `freePage` once the `lastPage` is consumed which needs to take a lock on `TaskMemoryManager`. At the same time, there can be another MemoryConsumer using `UnsafeExternalSorter` as part of sorting can try to `allocatePage` needs to get lock on `TaskMemoryManager` which can cause spill to happen which requires lock on `UnsafeExternalSorter` again causing deadlock. This is a classic deadlock situation happening similar to the SPARK-26265. To fix this, we can move the `freePage` call in `loadNext` outside of `Synchronized` block similar to the fix in SPARK-26265 ## How was this patch tested? Manual tests were being done and will also try to add a test. Closes apache#24265 from venkata91/deadlock-sorter. Authored-by: Venkata krishnan Sowrirajan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6c4552c) Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? apache#24265 breaks the lint check, because it has trailing space. (not sure why it passed jenkins). This PR fixes it. ## How was this patch tested? N/A Closes apache#24289 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? apache/spark#24265 breaks the lint check, because it has trailing space. (not sure why it passed jenkins). This PR fixes it. ## How was this patch tested? N/A Closes #24289 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit af0a4bb)
## What changes were proposed in this pull request? apache/spark#24265 breaks the lint check, because it has trailing space. (not sure why it passed jenkins). This PR fixes it. ## How was this patch tested? N/A Closes #24289 from cloud-fan/fix. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…rator when locking both UnsafeExternalSorter.SpillableIterator and TaskMemoryManager In `UnsafeExternalSorter.SpillableIterator#loadNext()` takes lock on the `UnsafeExternalSorter` and calls `freePage` once the `lastPage` is consumed which needs to take a lock on `TaskMemoryManager`. At the same time, there can be another MemoryConsumer using `UnsafeExternalSorter` as part of sorting can try to `allocatePage` needs to get lock on `TaskMemoryManager` which can cause spill to happen which requires lock on `UnsafeExternalSorter` again causing deadlock. This is a classic deadlock situation happening similar to the SPARK-26265. To fix this, we can move the `freePage` call in `loadNext` outside of `Synchronized` block similar to the fix in SPARK-26265 Manual tests were being done and will also try to add a test. Closes apache#24265 from venkata91/deadlock-sorter. Authored-by: Venkata krishnan Sowrirajan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6c4552c) Signed-off-by: Wenchen Fan <[email protected]> Ref: LIHADOOP-58062 RB=2553586 BUG=LIHADOOP-58062 G=spark-reviewers R=mmuralid A=mmuralid
What changes were proposed in this pull request?
In
UnsafeExternalSorter.SpillableIterator#loadNext()takes lock on theUnsafeExternalSorterand callsfreePageonce thelastPageis consumed which needs to take a lock onTaskMemoryManager. At the same time, there can be another MemoryConsumer usingUnsafeExternalSorteras part of sorting can try toallocatePageneeds to get lock onTaskMemoryManagerwhich can cause spill to happen which requires lock onUnsafeExternalSorteragain causing deadlock. This is a classic deadlock situation happening similar to the SPARK-26265.To fix this, we can move the
freePagecall inloadNextoutside ofSynchronizedblock similar to the fix in SPARK-26265How was this patch tested?
Manual tests were being done and will also try to add a test.