Skip to content

Conversation

@iven
Copy link

@iven iven commented Aug 15, 2014

This is useful when $SPARK_HOME of the submission machine is not the same as the slave machinese, eg. when running with Mesos.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@CodingCat
Copy link
Contributor

I once submitted a similar patch, but the latest solution (merged?) is that we will not send local SPARK_HOME to the remote end entirely..... @andrewor14?

@JoshRosen
Copy link
Contributor

There was a bunch of prior discussion about this in an old pull request for SPARK-1110 (I'd link to it, but it's from the now-deleted incubator-spark GitHub repo).

I think we decided that it didn't make sense for workers to inherit SPARK_HOME from the driver; there were some later patches that removed this dependency, if I recall.

@iven Was this pull request motivated by an issue that you saw when deploying Spark? Which version were you using, and on what platform?

@andrewor14
Copy link
Contributor

This is an updated JIRA for the same issue SPARK-2290. We established that, for standalone mode, we don't need to ship the driver's spark home to the executors, which may not use the same spark home. Instead, we should just use the Worker's current working directory. However, I am not familiar enough with Mesos to comment on the need of shipping SPARK_HOME there.

@iven There are many other places where we export SPARK_HOME in addition to these two. From a quick grep, I found the following:

bin/pyspark:export SPARK_HOME="$FWDIR"
bin/run-example:export SPARK_HOME="$FWDIR"
bin/spark-class:export SPARK_HOME="$FWDIR"
bin/spark-submit:export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
sbin/spark-config.sh:export SPARK_HOME=${SPARK_PREFIX}

We need to do the same for all of these places in order for your intended behavior to take effect. In the longer run, however, we should just clean up our usages of SPARK_HOME, since in many places we don't actually have any need to export it (or even use the variable SPARK_HOME).

@JoshRosen
Copy link
Contributor

In PySpark, it looks like we only use SPARK_HOME on the driver, where it's used to find the path to spark-submit and to locate test support files.

@iven
Copy link
Author

iven commented Aug 16, 2014

@JoshRosen I'm using Spark 1.0.2 with Mesos. If I don't specify SPARK_HOME in the driver, Mesos executors will LOST with error:

sh: /root/spark_master/sbin/spark-executor: No such file or directory

Where /root/spark_master is the SPARK_HOME of the driver.

I think this is caused by createExecutorInfo method in MesosSchedulerBackend.scala. When spark.executor.uri is not specified, it will use SPARK_HOME from SparkContext.

@iven
Copy link
Author

iven commented Aug 16, 2014

@andrewor14 OK. I've update the patch when we confirm this PR is necessary.

@andrewor14
Copy link
Contributor

@liancheng

@liancheng
Copy link
Contributor

@iven I'm a little confused here. Are you referring to some use case like this:

  1. Spark is installed in directory A on driver node, but directory B on all Mesos slave nodes
  2. Export SPARK_HOME to B on driver side
  3. Start spark-shell without specifying spark.executor.uri, and then expect Mesos to find Spark installation in B on executor side

Is it?

@iven
Copy link
Author

iven commented Aug 28, 2014

@liancheng Yes. Although I'm using spark-submit, not spark-shell.

@liancheng
Copy link
Contributor

Actually you can just set spark.home in spark-defaults.conf for this use case.

@andrewor14
Copy link
Contributor

Hi @iven, spark-shell actually goes through spark-submit. As @liancheng mentioned, you can set spark.home to control the executor side Spark location. This is not super intuitive, however, and there is an open PR that adds a more specific way to do this. #2166

At least with the existing code, the user should not set SPARK_HOME because the code depends on that in many places downstream. A better solution is to set an application-specific config. Would you mind closing this PR?

@iven
Copy link
Author

iven commented Aug 29, 2014

@liancheng @andrewor14 Thanks, it works! I'm closing this.

@iven iven closed this Aug 29, 2014
@iven iven deleted the spark-home branch August 29, 2014 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants