-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Set spark.executor.uri from environment variable (needed by Mesos) #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Jenkins, test this please. Good catch! |
|
Jenkins, test this please |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13785/ |
|
@ivanwick what is the symptom when this is not set correctly? If there is an exception or stacktrace it would be helpful to know what it does, so that other people who run into this problem can figure out that this is the fix for it. |
|
This patch fixes a bug with PySpark shell running on Mesos. Without the spark.executor.uri property, PySpark reports lost tasks because the slave is looking for the spark-executor in the wrong path and can never start it. It logs several "Lost TID" and "Executor lost", while the scheduler re-queues the lost tasks. They again fail for the same reason, finally ending with: The stderr of each slave in the Mesos framework reports: because this path doesn't exist on the slave nodes (this happens to be the path where it's installed on the head node). When spark.executor.uri is set, as it is with the Scala repl, Mesos is able to download the Spark dist package and run it from the framework temp directory on the slave. |
SPARK-991: Report information gleaned from a Python stacktrace in the UI Scala: - Added setCallSite/clearCallSite to SparkContext and JavaSparkContext. These functions mutate a LocalProperty called "externalCallSite." - Add a wrapper, getCallSite, that checks for an externalCallSite and, if none is found, calls the usual Utils.formatSparkCallSite. - Change everything that calls Utils.formatSparkCallSite to call getCallSite instead. Except getCallSite. - Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext. Python: - Add a gruesome hack to rdd.py that inspects the traceback and guesses what you want to see in the UI. - Add a RAII wrapper around said gruesome hack that calls setCallSite/clearCallSite as appropriate. - Wire said RAII wrapper up around three calls into the Scala code. I'm not sure that I hit all the spots with the RAII wrapper. I'm also not sure that my gruesome hack does exactly what we want. One could also approach this change by refactoring runJob/submitJob/runApproximateJob to take a call site, then threading that parameter through everything that needs to know it. One might object to the pointless-looking wrappers in JavaSparkContext. Unfortunately, I can't directly access the SparkContext from Python---or, if I can, I don't know how---so I need to wrap everything that matters in JavaSparkContext. Conflicts: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
|
Thanks Ivan, I've merged this in. |
The Mesos backend uses this property when setting up a slave process. It is similarly set in the Scala repl (org.apache.spark.repl.SparkILoop), but I couldn't find any analogous for pyspark. Author: Ivan Wick <[email protected]> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <[email protected]> Closes #311 from ivanwick/master and squashes the following commits: da0c3e4 [Ivan Wick] Set spark.executor.uri from environment variable (needed by Mesos) (cherry picked from commit 5cd11d5) Signed-off-by: Matei Zaharia <[email protected]>
The Mesos backend uses this property when setting up a slave process. It is similarly set in the Scala repl (org.apache.spark.repl.SparkILoop), but I couldn't find any analogous for pyspark. Author: Ivan Wick <[email protected]> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <[email protected]> Closes apache#311 from ivanwick/master and squashes the following commits: da0c3e4 [Ivan Wick] Set spark.executor.uri from environment variable (needed by Mesos)
## What changes were proposed in this pull request? Redshift has no unsigned types (http://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html), thus they can map to Long and Integer without loss of precision. This integrates community PR apache#311 (github.com/databricks/spark-redshift/pull/311/), where the user claims that this change has fixed the scenarios where he would get a Decimal instead of a Long. I was not able to reproduce the user problem, but nevertheless the change removes code that handles unsigned types which actually dead, so it is not hurtful. ## How was this patch tested? Added tests from the community PR#311 Author: Juliusz Sompolski <[email protected]> Closes apache#180 from juliuszsompolski/SC-5620.
Set mocha timeout to 200000ms
apache#632) [HADP-53083][SPARK-47383][CORE] Support `spark.shutdown.timeout` config (apache#311) Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property. Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners). This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it. Yes, a new config `spark.shutdown.timeout` is added. Manual testing in spark-shell. This behavior is not practical to write a unit test for. No Closes apache#45504 from robreeves/sc_shutdown_timeout. Authored-by: Rob Reeves <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Signed-off-by: Yujie Li <[email protected]> Co-authored-by: Rob Reeves <[email protected]>
The Mesos backend uses this property when setting up a slave process. It is similarly set in the Scala repl (org.apache.spark.repl.SparkILoop), but I couldn't find any analogous for pyspark.