-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7736] [core] [yarn] Make pyspark fail YARN app on failure. #7751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The YARN backend doesn't like when user code calls `System.exit`, since it cannot know the exit status and thus cannot set an appropriate final status for the application. So, for pyspark, avoid that call and instead throw an exception with the exit code. SparkSubmit handles that exception and exits with the given exit code, while YARN uses the exit code as the failure code for the Spark app.
|
Tested on real cluster. Not sure if it's worth it to add another test in YarnClusterSuite, since those are generally slow, but it would be easy to do so. |
|
Test build #38854 timed out for PR 7751 at commit |
|
Jenkins retest this please. |
|
can you make this |
|
Just ran into an issue where py4j threads are not daemon threads, so the YARN app is not exiting. Taking a look... |
py4j uses non-daemon threads internally, so if it's not explicitly stopped, it will prevent the process from exiting now that System.exit() is not being used.
|
Test build #146 timed out for PR 7751 at commit |
|
Test build #38885 timed out for PR 7751 at commit |
|
Test build #38911 has finished for PR 7751 at commit
|
|
Seems pretty reasonable to me. |
Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
|
Test build #39774 has finished for PR 7751 at commit
|
|
retest this please |
|
Test build #39886 has finished for PR 7751 at commit
|
|
Test build #235 has finished for PR 7751 at commit
|
|
retest this please |
|
I'm gonna merge this later today if there are no more comments. |
|
retest this please |
|
Test build #40540 timed out for PR 7751 at commit |
|
retest this please |
|
Test build #40576 has finished for PR 7751 at commit
|
|
retest this please |
|
Test build #1494 has finished for PR 7751 at commit
|
|
This has passed tests already, all latest failures are due to flakiness... retest this please |
|
retest this please |
|
jenkins, retest this please |
|
Test build #40689 has finished for PR 7751 at commit
|
|
retest this please |
|
Test build #40714 timed out for PR 7751 at commit |
|
retest this please |
|
Test build #40797 timed out for PR 7751 at commit |
|
retest this please |
|
Test build #40836 timed out for PR 7751 at commit |
|
retest this please |
|
wtf |
|
retest this please |
|
Test build #1605 has finished for PR 7751 at commit
|
|
Streaming tests are really flaky... anyway, all interesting tests passed, so unless I hear otherwise, I'll merge this Monday morning. |
The YARN backend doesn't like when user code calls `System.exit`, since it cannot know the exit status and thus cannot set an appropriate final status for the application. So, for pyspark, avoid that call and instead throw an exception with the exit code. SparkSubmit handles that exception and exits with the given exit code, while YARN uses the exit code as the failure code for the Spark app. Author: Marcelo Vanzin <[email protected]> Closes #7751 from vanzin/SPARK-9416. (cherry picked from commit f68d024)
The YARN backend doesn't like when user code calls `System.exit`, since it cannot know the exit status and thus cannot set an appropriate final status for the application. So, for pyspark, avoid that call and instead throw an exception with the exit code. SparkSubmit handles that exception and exits with the given exit code, while YARN uses the exit code as the failure code for the Spark app. Author: Marcelo Vanzin <[email protected]> Closes apache#7751 from vanzin/SPARK-9416. (cherry picked from commit f68d024)
The YARN backend doesn't like when user code calls
System.exit,since it cannot know the exit status and thus cannot set an
appropriate final status for the application.
So, for pyspark, avoid that call and instead throw an exception with
the exit code. SparkSubmit handles that exception and exits with
the given exit code, while YARN uses the exit code as the failure
code for the Spark app.