-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15288] [Mesos] Mesos dispatcher should handle gracefully when any thread gets UncaughtException #13072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
thread gets UncaughtException
|
ok to test |
|
Test build #60999 has finished for PR 13072 at commit
|
|
Test build #70646 has finished for PR 13072 at commit
|
|
LGTM, @srowen can you please take a look? |
|
We only otherwise do this in |
|
MesosClusterDispatcher also has multiple threads like Executor, when any one thread terminates in the MesosClusterDispatcher process due to some error/exception it keeps running without performing the terminated thread functionality. I think we need to handle those uncaught exceptions from the MesosClusterDispatcher process threads using the UncaughtExceptionHandler and take the action instead of running the MesosClusterDispatcher without performing the functionality and without notifying the user. |
|
@srowen Can you check this? |
|
I can't really evaluate the change; is anyone else familiar with Mesos around to comment? that'd be best. |
|
LGTM. @srowen Ideally all processes we have should handle thread termination correctly, same applies to MesosClusterDispatcher. Btw I think the call in Executor.scala which sets the handler should be done in static code and ASAP, for example in MesosExecutorBackend this should happen in its main method. From what I see right now the call is executed when a new Executor class is instantiated which might be a bit late because for example the createExecutorEnv call which always comes first, creates its own threads (eg. MapoutTracker threads). |
|
LGTM. @srowen please merge. Out of curiosity, @devaraj-kavali what exception were you seeing? |
|
Thanks @mgummelt for the confirmation, It throws SparkException with the bug SPARK-15359/#13143. |
|
Test build #3585 has finished for PR 13072 at commit
|
|
Merged to master |
…ny thread gets UncaughtException ## What changes were proposed in this pull request? Adding the default UncaughtExceptionHandler to the MesosClusterDispatcher. ## How was this patch tested? I verified it manually, when any of the dispatcher thread gets uncaught exceptions then the default UncaughtExceptionHandler will handle those exceptions. Author: Devaraj K <[email protected]> Closes apache#13072 from devaraj-kavali/SPARK-15288.
What changes were proposed in this pull request?
Adding the default UncaughtExceptionHandler to the MesosClusterDispatcher.
How was this patch tested?
I verified it manually, when any of the dispatcher thread gets uncaught exceptions then the default UncaughtExceptionHandler will handle those exceptions.