-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18280][Core]Fix potential deadlock in StandaloneSchedulerBackend.dead
#15775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
How can we prevent issues like this from happening again in the future? Any exception we can throw better or linter to do? |
|
How about throw an exception if the current thread is an RPC thread in |
|
Test build #68163 has finished for PR 15775 at commit
|
|
Yea I think that's a good idea ... at least we will know. |
| } finally { | ||
| // Ensure the application terminates, as we can no longer run jobs. | ||
| sc.stop() | ||
| new Thread("stop-spark-context") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also document why we need this.
|
I added a thread local flag inside RPC threads. Instead of throwing an exception, I think this is better since we always launch a new thread to fix such issue. |
|
Test build #68287 has finished for PR 15775 at commit
|
|
retest this please |
|
Test build #68297 has finished for PR 15775 at commit
|
|
Test build #3418 has finished for PR 15775 at commit
|
|
retest this please |
|
Test build #68320 has finished for PR 15775 at commit
|
|
Thanks! Merging to master, 2.1 and 2.0. |
…kend.dead` ## What changes were proposed in this pull request? "StandaloneSchedulerBackend.dead" is called in a RPC thread, so it should not call "SparkContext.stop" in the same thread. "SparkContext.stop" will block until all RPC threads exit, if it's called inside a RPC thread, it will be dead-lock. This PR add a thread local flag inside RPC threads. `SparkContext.stop` uses it to decide if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes #15775 from zsxwing/SPARK-18280.
…kend.dead` ## What changes were proposed in this pull request? "StandaloneSchedulerBackend.dead" is called in a RPC thread, so it should not call "SparkContext.stop" in the same thread. "SparkContext.stop" will block until all RPC threads exit, if it's called inside a RPC thread, it will be dead-lock. This PR add a thread local flag inside RPC threads. `SparkContext.stop` uses it to decide if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes #15775 from zsxwing/SPARK-18280.
…Utils.tryOrStopSparkContext ## What changes were proposed in this pull request? When `SparkContext.stop` is called in `Utils.tryOrStopSparkContext` (the following three places), it will cause deadlock because the `stop` method needs to wait for the thread running `stop` to exit. - ContextCleaner.keepCleaning - LiveListenerBus.listenerThread.run - TaskSchedulerImpl.start This PR adds `SparkContext.stopInNewThread` and uses it to eliminate the potential deadlock. I also removed my changes in #15775 since they are not necessary now. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes #16178 from zsxwing/fix-stop-deadlock. (cherry picked from commit 26432df) Signed-off-by: Shixiong Zhu <[email protected]>
…Utils.tryOrStopSparkContext ## What changes were proposed in this pull request? When `SparkContext.stop` is called in `Utils.tryOrStopSparkContext` (the following three places), it will cause deadlock because the `stop` method needs to wait for the thread running `stop` to exit. - ContextCleaner.keepCleaning - LiveListenerBus.listenerThread.run - TaskSchedulerImpl.start This PR adds `SparkContext.stopInNewThread` and uses it to eliminate the potential deadlock. I also removed my changes in #15775 since they are not necessary now. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes #16178 from zsxwing/fix-stop-deadlock.
…Utils.tryOrStopSparkContext ## What changes were proposed in this pull request? When `SparkContext.stop` is called in `Utils.tryOrStopSparkContext` (the following three places), it will cause deadlock because the `stop` method needs to wait for the thread running `stop` to exit. - ContextCleaner.keepCleaning - LiveListenerBus.listenerThread.run - TaskSchedulerImpl.start This PR adds `SparkContext.stopInNewThread` and uses it to eliminate the potential deadlock. I also removed my changes in apache#15775 since they are not necessary now. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes apache#16178 from zsxwing/fix-stop-deadlock.
…kend.dead` ## What changes were proposed in this pull request? "StandaloneSchedulerBackend.dead" is called in a RPC thread, so it should not call "SparkContext.stop" in the same thread. "SparkContext.stop" will block until all RPC threads exit, if it's called inside a RPC thread, it will be dead-lock. This PR add a thread local flag inside RPC threads. `SparkContext.stop` uses it to decide if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes apache#15775 from zsxwing/SPARK-18280.
…Utils.tryOrStopSparkContext ## What changes were proposed in this pull request? When `SparkContext.stop` is called in `Utils.tryOrStopSparkContext` (the following three places), it will cause deadlock because the `stop` method needs to wait for the thread running `stop` to exit. - ContextCleaner.keepCleaning - LiveListenerBus.listenerThread.run - TaskSchedulerImpl.start This PR adds `SparkContext.stopInNewThread` and uses it to eliminate the potential deadlock. I also removed my changes in apache#15775 since they are not necessary now. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes apache#16178 from zsxwing/fix-stop-deadlock.
What changes were proposed in this pull request?
"StandaloneSchedulerBackend.dead" is called in a RPC thread, so it should not call "SparkContext.stop" in the same thread. "SparkContext.stop" will block until all RPC threads exit, if it's called inside a RPC thread, it will be dead-lock.
This PR add a thread local flag inside RPC threads.
SparkContext.stopuses it to decide if launching a new thread to stop the SparkContext.How was this patch tested?
Jenkins