-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-1466] Raise exception if pyspark Gateway process doesn't start. #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This should be backported to 0.9 and 1.0 |
|
BTW this is a much bigger issue with iPython notebook -- if you're running in the console, you get the wrong error (with parsing the int) but also the correct error. If you're running in iPython notebook, you only get the wrong error, making this very annoying to debug. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14017/ |
|
Jenkins, retest this please. |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14069/ |
python/pyspark/java_gateway.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should say "Launching GatewayServer failed"? It will be more informative, otherwise people will think something is wrong with SparkContext itself.
|
@kayousterhout not sure if you saw my comment, this looks good but the exception message is somewhat confusing. It would be good to update that. |
|
Jenkins, retest this please. |
|
Merged build triggered. |
|
Merged build started. |
|
I did but haven't had time to figure out why the tests are failing (the tests don't run properly on my laptop). Hoping this was a Jenkins issue and the re-launched tests pass. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14230/ |
Also include stderr output to help user debug startup issue.
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
It looks like the tests magically passed now! Is this good to go? |
|
Yup! I just rebased so it should merge cleanly on master. Sent from my iPhone On Jun 9, 2014, at 7:03 PM, Matei Zaharia [email protected] wrote:
|
If the gateway process fails to start correctly (e.g., because JAVA_HOME isn't set correctly, there's no Spark jar, etc.), right now pyspark fails because of a very difficult-to-understand error, where we try to parse stdout to get the port where Spark started and there's nothing there. This commit properly catches the error and throws an exception that includes the stderr output for much easier debugging. Thanks to @shivaram and @stogers for helping to fix this issue! Author: Kay Ousterhout <[email protected]> Closes apache#383 from kayousterhout/pyspark and squashes the following commits: 36dd54b [Kay Ousterhout] [SPARK-1466] Raise exception if Gateway process doesn't start.
If the gateway process fails to start correctly (e.g., because JAVA_HOME isn't set correctly, there's no Spark jar, etc.), right now pyspark fails because of a very difficult-to-understand error, where we try to parse stdout to get the port where Spark started and there's nothing there. This commit properly catches the error and throws an exception that includes the stderr output for much easier debugging. Thanks to @shivaram and @stogers for helping to fix this issue! Author: Kay Ousterhout <[email protected]> Closes apache#383 from kayousterhout/pyspark and squashes the following commits: 36dd54b [Kay Ousterhout] [SPARK-1466] Raise exception if Gateway process doesn't start.
…he#383) This makes executors consistent with the driver. Note that SPARK_EXTRA_CLASSPATH isn't set anywhere by Spark itself, but it's primarily meant to be set by images that inherit from the base driver/executor images.
…he#383) This makes executors consistent with the driver. Note that SPARK_EXTRA_CLASSPATH isn't set anywhere by Spark itself, but it's primarily meant to be set by images that inherit from the base driver/executor images.
Apply patches for SPARK-24531 to fix tests
Diable S3 test cases in fusioncloud job
…leanup (apache#383) ([Original PR](apache#46027)) ### What changes were proposed in this pull request? Expired sessions are regularly checked and cleaned up by a maintenance thread. However, currently, this process is synchronous. Therefore, in rare cases, interrupting the execution thread of a query in a session can take hours, causing the entire maintenance process to stall, resulting in a large amount of memory not being cleared. We address this by introducing asynchronous callbacks for execution cleanup, avoiding synchronous joins of execution threads, and preventing the maintenance thread from stalling in the above scenarios. To be more specific, instead of calling `runner.join()` in `ExecutorHolder.close()`, we set a post-cleanup function as the callback through `runner.processOnCompletion`, which will be called asynchronously once the execution runner is completed or interrupted. In this way, the maintenance thread won't get blocked on joining an execution thread. ### Why are the changes needed? In the rare cases mentioned above, performance can be severely affected. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests and a new test `Async cleanup callback gets called after the execution is closed` in `ReattachableExecuteSuite.scala`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46064 from xi-db/SPARK-47819-async-cleanup-3.5. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> Co-authored-by: Xi Lyu <[email protected]>
If the gateway process fails to start correctly (e.g., because JAVA_HOME isn't set correctly, there's no Spark jar, etc.), right now pyspark fails because of a very difficult-to-understand error, where we try to parse stdout to get the port where Spark started and there's nothing there. This commit properly catches the error and throws an exception that includes the stderr output for much easier debugging.
Thanks to @shivaram and @stogers for helping to fix this issue!