Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
fa93415
Added yarn-deploy-mode alternative
Aug 9, 2015
437a4d4
Moved Master URLs closer above before the examples
Aug 9, 2015
05fe708
Removed the addition section
Aug 10, 2015
98624e8
Added a section for alternative submission. Distinguished from the sh…
Aug 10, 2015
b8fdd5c
Added section for preferred yarn and kept the one with deploy-mode fo…
Aug 12, 2015
8c65676
Moved the Standalone examples together
Aug 12, 2015
8a331d0
Moved Master URLs
Aug 12, 2015
0fed23b
Added deploy-mode section to YARN submission
Aug 13, 2015
670d251
Added yarn-deploy-mode alternative
Aug 9, 2015
40d3b80
Moved Master URLs closer above before the examples
Aug 9, 2015
89d15bf
Removed the addition section
Aug 10, 2015
d2c212a
Added a section for alternative submission. Distinguished from the sh…
Aug 10, 2015
3f25500
Added section for preferred yarn and kept the one with deploy-mode fo…
Aug 12, 2015
0766da6
Moved the Standalone examples together
Aug 12, 2015
46a24d5
Moved Master URLs
Aug 12, 2015
9175807
Added deploy-mode section to YARN submission
Aug 13, 2015
3052c74
Merge branch 'SPARK-9570' of https://github.com/nssalian/spark into S…
Aug 23, 2015
c91073e
Modified Running on YARN doc
Aug 23, 2015
3dc79e2
Modified submitting applications
Aug 23, 2015
67a4255
Removed extra YARN section, there is already a running without --depl…
Aug 23, 2015
a8b67ef
Added --deploy-mode flags to the yarn submission sections
Aug 24, 2015
d93d4ba
Changed R/ReadME
Sep 19, 2015
108caec
Changed parent/README
Sep 19, 2015
12ecd43
Modified Running on yarn
Sep 19, 2015
0cd5d0b
Changed submitting-applications
Sep 19, 2015
07ed32c
Changed /deploy/yarn/Client.scala
Sep 19, 2015
1b86c35
Modified SparkSubmitSuite.scala
Sep 19, 2015
9be5993
Recent Review changes
Sep 27, 2015
177146e
Review Changes --deploy-mode
Oct 1, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion R/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,6 @@ You can also run the unit-tests for SparkR by running (you need to install the [
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
```
export YARN_CONF_DIR=/etc/hadoop/conf
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
./bin/spark-submit --master yarn --deploy-mode client examples/src/main/r/dataframe.R

```
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,7 @@ To run one of them, use `./bin/run-example <class> [params]`. For example:
will run the Pi example locally.

You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn-cluster" or "yarn-client" to run on YARN, and "local" to run
examples to a cluster. This can be a mesos:// or spark:// URL, "yarn" to run on YARN and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the `examples`
package. For instance:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -412,7 +412,8 @@ class SparkSubmitSuite

// Test files and archives (Yarn)
val clArgs2 = Seq(
"--master", "yarn-client",
"--master", "yarn",
"--deploy-mode","client",
"--class", "org.SomeClass",
"--files", files,
"--archives", archives,
Expand Down Expand Up @@ -470,7 +471,8 @@ class SparkSubmitSuite
writer2.println("spark.yarn.dist.archives " + archives)
writer2.close()
val clArgs2 = Seq(
"--master", "yarn-client",
"--master", "yarn",
"--deploy-mode","client",
"--class", "org.SomeClass",
"--properties-file", f2.getPath,
"thejar.jar"
Expand Down
39 changes: 27 additions & 12 deletions docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,37 +16,52 @@ containers used by the application use the same configuration. If the configurat
Java system properties or environment variables not managed by YARN, they should also be set in the
Spark application's configuration (driver, executors, and the AM when running in client mode).

There are two deploy modes that can be used to launch Spark applications on YARN. In `yarn-cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `yarn-client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
There are two deploy modes that can be used to launch Spark applications on YARN. In `cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

Unlike in Spark standalone and Mesos mode, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn-client` or `yarn-cluster`.
To launch a Spark application in `yarn-cluster` mode:
Unlike in Spark standalone and Mesos mode, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn` and `--deploy-mode` can be `client` or `cluster` to select the YARN deployment mode.
To launch a Spark application in YARN in `cluster` mode:

`$ ./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]`
`$ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]`

For example:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--master yarn \
--deploy-mode cluster
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10

The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Debugging your Application" section below for how to see driver and executor logs.
The above example starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the "Debugging your Application" section below for how to see driver and executor logs.

To launch a Spark application in `client` mode, do the same, but replace `cluster` with `client` in the `--deploy-mode` argument.
To run spark-shell:

To launch a Spark application in `yarn-client` mode, do the same, but replace `yarn-cluster` with `yarn-client`. To run spark-shell:
$ ./bin/spark-shell --master yarn --deploy-mode client

$ ./bin/spark-shell --master yarn-client
For example:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10

## Adding Other JARs

In `yarn-cluster` mode, the driver runs on a different machine than the client, so `SparkContext.addJar` won't work out of the box with files that are local to the client. To make files on the client available to `SparkContext.addJar`, include them with the `--jars` option in the launch command.

$ ./bin/spark-submit --class my.main.Class \
--master yarn-cluster \
--master yarn
--deploy-mode cluster \
--jars my-other-jar.jar,my-other-other-jar.jar
my-main-jar.jar
app_arg1 app_arg2
Expand Down Expand Up @@ -386,6 +401,6 @@ If you need a reference to the proper location to put log files in the YARN so t
# Important notes

- Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.
- In `yarn-cluster` mode, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored. In `yarn-client` mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir`. This is because the Spark driver does not run on the YARN cluster in `yarn-client` mode, only the Spark executors do.
- In yarn-cluster, the local directories used by the Spark executors and the Spark driver will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored. In yarn-client mode, the Spark executors will use the local directories configured for YARN while the Spark driver will use those defined in `spark.local.dir`. This is because the Spark driver does not run on the YARN cluster in yarn-client mode, only the Spark executors do.
- The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named localtest.txt into HDFS but this will be linked to by the name `appSees.txt`, and your application should use the name as `appSees.txt` to reference it when running on YARN.
- The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `yarn-cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files.
2 changes: 1 addition & 1 deletion docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1551,7 +1551,7 @@ on all of the worker nodes, as they will need access to the Hive serialization a
(SerDes) in order to access data stored in Hive.

Configuration of Hive is done by placing your `hive-site.xml` file in `conf/`. Please note when running
the query on a YARN cluster (`yarn-cluster` mode), the `datanucleus` jars under the `lib_managed/jars` directory
the query on a YARN cluster (--master yarn --deploy-mode cluster mode), the `datanucleus` jars under the `lib_managed/jars` directory
and `hive-site.xml` under `conf/` directory need to be available on the driver and all executors launched by the
YARN cluster. The convenient way to do this is adding them through the `--jars` option and `--file` option of the
`spark-submit` command.
Expand Down
6 changes: 3 additions & 3 deletions docs/submitting-applications.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,13 @@ run it with `--help`. Here are a few examples of common options:
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \ # can also be `yarn-client` for client mode
--master yarn \
--deploy-mode cluster \
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000

# Run a Python application on a Spark Standalone cluster
./bin/spark-submit \
--master spark://207.184.161.138:7077 \
Expand Down Expand Up @@ -140,7 +141,6 @@ cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or
</td></tr>
</table>


# Loading Configuration from a File

The `spark-submit` script can load default [Spark configuration values](configuration.html) from a
Expand Down