[SPARK-1466] Raise exception if pyspark Gateway process doesn't start. #383

kayousterhout · 2014-04-10T21:19:20Z

If the gateway process fails to start correctly (e.g., because JAVA_HOME isn't set correctly, there's no Spark jar, etc.), right now pyspark fails because of a very difficult-to-understand error, where we try to parse stdout to get the port where Spark started and there's nothing there. This commit properly catches the error and throws an exception that includes the stderr output for much easier debugging.

Thanks to @shivaram and @stogers for helping to fix this issue!

kayousterhout · 2014-04-10T21:20:05Z

This should be backported to 0.9 and 1.0

kayousterhout · 2014-04-10T21:22:34Z

BTW this is a much bigger issue with iPython notebook -- if you're running in the console, you get the wrong error (with parsing the int) but also the correct error. If you're running in iPython notebook, you only get the wrong error, making this very annoying to debug.

AmplabJenkins · 2014-04-10T21:23:11Z

Merged build triggered.

AmplabJenkins · 2014-04-10T21:23:17Z

Merged build started.

AmplabJenkins · 2014-04-10T22:54:11Z

Merged build finished.

AmplabJenkins · 2014-04-10T22:54:12Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14017/

pwendell · 2014-04-12T02:57:40Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-12T02:58:12Z

Merged build triggered.

AmplabJenkins · 2014-04-12T02:58:18Z

Merged build started.

AmplabJenkins · 2014-04-12T04:29:07Z

Merged build finished.

AmplabJenkins · 2014-04-12T04:29:07Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14069/

mateiz · 2014-04-13T06:13:46Z

python/pyspark/java_gateway.py

Maybe this should say "Launching GatewayServer failed"? It will be more informative, otherwise people will think something is wrong with SparkContext itself.

mateiz · 2014-04-18T02:34:39Z

@kayousterhout not sure if you saw my comment, this looks good but the exception message is somewhat confusing. It would be good to update that.

pwendell · 2014-04-18T05:10:23Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-18T05:13:12Z

Merged build triggered.

AmplabJenkins · 2014-04-18T05:13:23Z

Merged build started.

kayousterhout · 2014-04-18T05:13:29Z

I did but haven't had time to figure out why the tests are failing (the tests don't run properly on my laptop). Hoping this was a Jenkins issue and the re-launched tests pass.

AmplabJenkins · 2014-04-18T06:44:28Z

Merged build finished.

AmplabJenkins · 2014-04-18T06:44:28Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14230/

Also include stderr output to help user debug startup issue.

AmplabJenkins · 2014-06-10T00:07:50Z

Merged build triggered.

AmplabJenkins · 2014-06-10T00:07:58Z

Merged build started.

AmplabJenkins · 2014-06-10T01:30:29Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-10T01:30:29Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15585/

mateiz · 2014-06-10T02:03:43Z

It looks like the tests magically passed now! Is this good to go?

kayousterhout · 2014-06-10T02:43:31Z

Yup! I just rebased so it should merge cleanly on master.

Sent from my iPhone

On Jun 9, 2014, at 7:03 PM, Matei Zaharia [email protected] wrote:

It looks like the tests magically passed now! Is this good to go?

—
Reply to this email directly or view it on GitHub.

@shivaram

If the gateway process fails to start correctly (e.g., because JAVA_HOME isn't set correctly, there's no Spark jar, etc.), right now pyspark fails because of a very difficult-to-understand error, where we try to parse stdout to get the port where Spark started and there's nothing there. This commit properly catches the error and throws an exception that includes the stderr output for much easier debugging. Thanks to @shivaram and @stogers for helping to fix this issue! Author: Kay Ousterhout <[email protected]> Closes apache#383 from kayousterhout/pyspark and squashes the following commits: 36dd54b [Kay Ousterhout] [SPARK-1466] Raise exception if Gateway process doesn't start.

@shivaram

If the gateway process fails to start correctly (e.g., because JAVA_HOME isn't set correctly, there's no Spark jar, etc.), right now pyspark fails because of a very difficult-to-understand error, where we try to parse stdout to get the port where Spark started and there's nothing there. This commit properly catches the error and throws an exception that includes the stderr output for much easier debugging. Thanks to @shivaram and @stogers for helping to fix this issue! Author: Kay Ousterhout <[email protected]> Closes apache#383 from kayousterhout/pyspark and squashes the following commits: 36dd54b [Kay Ousterhout] [SPARK-1466] Raise exception if Gateway process doesn't start.

…he#383) This makes executors consistent with the driver. Note that SPARK_EXTRA_CLASSPATH isn't set anywhere by Spark itself, but it's primarily meant to be set by images that inherit from the base driver/executor images.

Apply patches for SPARK-24531 to fix tests

Diable S3 test cases in fusioncloud job

…leanup (apache#383) ([Original PR](apache#46027)) ### What changes were proposed in this pull request? Expired sessions are regularly checked and cleaned up by a maintenance thread. However, currently, this process is synchronous. Therefore, in rare cases, interrupting the execution thread of a query in a session can take hours, causing the entire maintenance process to stall, resulting in a large amount of memory not being cleared. We address this by introducing asynchronous callbacks for execution cleanup, avoiding synchronous joins of execution threads, and preventing the maintenance thread from stalling in the above scenarios. To be more specific, instead of calling `runner.join()` in `ExecutorHolder.close()`, we set a post-cleanup function as the callback through `runner.processOnCompletion`, which will be called asynchronously once the execution runner is completed or interrupted. In this way, the maintenance thread won't get blocked on joining an execution thread. ### Why are the changes needed? In the rare cases mentioned above, performance can be severely affected. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests and a new test `Async cleanup callback gets called after the execution is closed` in `ReattachableExecuteSuite.scala`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46064 from xi-db/SPARK-47819-async-cleanup-3.5. Authored-by: Xi Lyu <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> Co-authored-by: Xi Lyu <[email protected]>

mateiz reviewed Apr 13, 2014
View reviewed changes

[SPARK-1466] Raise exception if Gateway process doesn't start.

36dd54b

Also include stderr output to help user debug startup issue.

asfgit closed this in 3870248 Jun 18, 2014

mccheah added a commit to mccheah/spark that referenced this pull request Nov 28, 2018

Merge pull request apache#383 from palantir/spark-24531

e5c3be5

Apply patches for SPARK-24531 to fix tests

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#383 from theopenlab/disable-s3-cases

3fa347e

Diable S3 test cases in fusioncloud job

[SPARK-1466] Raise exception if pyspark Gateway process doesn't start. #383

[SPARK-1466] Raise exception if pyspark Gateway process doesn't start. #383

Uh oh!

Conversation

kayousterhout commented Apr 10, 2014

Uh oh!

kayousterhout commented Apr 10, 2014

Uh oh!

kayousterhout commented Apr 10, 2014

Uh oh!

AmplabJenkins commented Apr 10, 2014

Uh oh!

AmplabJenkins commented Apr 10, 2014

Uh oh!

AmplabJenkins commented Apr 10, 2014

Uh oh!

AmplabJenkins commented Apr 10, 2014

Uh oh!

pwendell commented Apr 12, 2014

Uh oh!

AmplabJenkins commented Apr 12, 2014

Uh oh!

AmplabJenkins commented Apr 12, 2014

Uh oh!

AmplabJenkins commented Apr 12, 2014

Uh oh!

AmplabJenkins commented Apr 12, 2014

Uh oh!

mateiz Apr 13, 2014

Choose a reason for hiding this comment

Uh oh!

mateiz commented Apr 18, 2014

Uh oh!

pwendell commented Apr 18, 2014

Uh oh!

AmplabJenkins commented Apr 18, 2014

Uh oh!

AmplabJenkins commented Apr 18, 2014

Uh oh!

kayousterhout commented Apr 18, 2014

Uh oh!

AmplabJenkins commented Apr 18, 2014

Uh oh!

AmplabJenkins commented Apr 18, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

AmplabJenkins commented Jun 10, 2014

Uh oh!

mateiz commented Jun 10, 2014

Uh oh!

kayousterhout commented Jun 10, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants