Skip to content

Commit 381fe32

Browse files
committed
Update docs for standalone mode
1 parent 757c184 commit 381fe32

File tree

2 files changed

+42
-20
lines changed

2 files changed

+42
-20
lines changed

conf/spark-env.sh.template

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@
3030

3131
# Options for the daemons used in the standalone deploy mode:
3232
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
33-
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
33+
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
3434
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
3535
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
3636
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
37-
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
37+
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
3838
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
3939
# - SPARK_WORKER_DIR, to set the working directory of worker processes
4040
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")

docs/spark-standalone.md

Lines changed: 40 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ Once you've set up this file, you can launch or stop your cluster with the follo
7070
- `sbin/start-slaves.sh` - Starts a slave instance on each machine specified in the `conf/slaves` file.
7171
- `sbin/start-all.sh` - Starts both a master and a number of slaves as described above.
7272
- `sbin/stop-master.sh` - Stops the master that was started via the `bin/start-master.sh` script.
73-
- `sbin/stop-slaves.sh` - Stops the slave instances that were started via `bin/start-slaves.sh`.
73+
- `sbin/stop-slaves.sh` - Stops all slave instances the machines specified in the `conf/slaves` file.
7474
- `sbin/stop-all.sh` - Stops both the master and the slaves as described above.
7575

7676
Note that these scripts must be executed on the machine you want to run the Spark master on, not your local machine.
@@ -92,12 +92,8 @@ You can optionally configure the cluster further by setting environment variable
9292
<td>Port for the master web UI (default: 8080).</td>
9393
</tr>
9494
<tr>
95-
<td><code>SPARK_WORKER_PORT</code></td>
96-
<td>Start the Spark worker on a specific port (default: random).</td>
97-
</tr>
98-
<tr>
99-
<td><code>SPARK_WORKER_DIR</code></td>
100-
<td>Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work).</td>
95+
<td><code>SPARK_MASTER_OPTS</code></td>
96+
<td>Configuration properties that apply only to the master in the form "-Dx=y" (default: none).</td>
10197
</tr>
10298
<tr>
10399
<td><code>SPARK_WORKER_CORES</code></td>
@@ -107,6 +103,10 @@ You can optionally configure the cluster further by setting environment variable
107103
<td><code>SPARK_WORKER_MEMORY</code></td>
108104
<td>Total amount of memory to allow Spark applications to use on the machine, e.g. <code>1000m</code>, <code>2g</code> (default: total memory minus 1 GB); note that each application's <i>individual</i> memory is configured using its <code>spark.executor.memory</code> property.</td>
109105
</tr>
106+
<tr>
107+
<td><code>SPARK_WORKER_PORT</code></td>
108+
<td>Start the Spark worker on a specific port (default: random).</td>
109+
</tr>
110110
<tr>
111111
<td><code>SPARK_WORKER_WEBUI_PORT</code></td>
112112
<td>Port for the worker web UI (default: 8081).</td>
@@ -120,13 +120,25 @@ You can optionally configure the cluster further by setting environment variable
120120
or else each worker will try to use all the cores.
121121
</td>
122122
</tr>
123+
<tr>
124+
<td><code>SPARK_WORKER_DIR</code></td>
125+
<td>Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work).</td>
126+
</tr>
127+
<tr>
128+
<td><code>SPARK_WORKER_OPTS</code></td>
129+
<td>Configuration properties that apply only to the worker in the form "-Dx=y" (default: none).</td>
130+
</tr>
123131
<tr>
124132
<td><code>SPARK_DAEMON_MEMORY</code></td>
125133
<td>Memory to allocate to the Spark master and worker daemons themselves (default: 512m).</td>
126134
</tr>
127135
<tr>
128136
<td><code>SPARK_DAEMON_JAVA_OPTS</code></td>
129-
<td>JVM options for the Spark master and worker daemons themselves (default: none).</td>
137+
<td>JVM options for the Spark master and worker daemons themselves in the form "-Dx=y" (default: none).</td>
138+
</tr>
139+
<tr>
140+
<td><code>SPARK_PUBLIC_DNS</code></td>
141+
<td>The public DNS name of the Spark master and workers (default: none).</td>
130142
</tr>
131143
</table>
132144

@@ -150,20 +162,30 @@ You can also pass an option `--cores <numCores>` to control the number of cores
150162

151163
Spark supports two deploy modes. Spark applications may run with the driver inside the client process or entirely inside the cluster.
152164

153-
The spark-submit script described in the [cluster mode overview](cluster-overview.html) provides the most straightforward way to submit a compiled Spark application to the cluster in either deploy mode. For info on the lower-level invocations used to launch an app inside the cluster, read ahead.
165+
The spark-submit script provides the most straightforward way to submit a compiled Spark application to the cluster in either deploy mode. For more detail, see the [cluster mode overview](cluster-overview.html).
166+
167+
./bin/spark-submit \
168+
--class <main-class>
169+
--master <master-url> \
170+
--deploy-mode <deploy-mode> \
171+
... // other options
172+
<application-jar>
173+
[application-arguments]
154174

155-
## Launching Applications Inside the Cluster
175+
main-class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
176+
master-url: The URL of the master node (e.g. spark://23.195.26.187:7077)
177+
deploy-mode: Whether to deploy this application within the cluster or from an external client (e.g. client)
178+
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
179+
application-arguments: Arguments passed to the main method of <main-class>
180+
181+
Behind the scenes, this invokes the standalone Client to launch your application, which is also the legacy way to launch your application before Spark 1.0.
156182

157183
./bin/spark-class org.apache.spark.deploy.Client launch
158184
[client-options] \
159-
<cluster-url> <application-jar-url> <main-class> \
160-
[application-options]
161-
162-
cluster-url: The URL of the master node.
163-
application-jar-url: Path to a bundled jar including your application and all dependencies. Currently, the URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
164-
main-class: The entry point for your application.
185+
<master-url> <application-jar> <main-class> \
186+
[application-arguments]
165187

166-
Client Options:
188+
client-options:
167189
--memory <count> (amount of memory, in MB, allocated for your driver program)
168190
--cores <count> (number of cores allocated for your driver program)
169191
--supervise (whether to automatically restart your driver on application or node failure)
@@ -172,7 +194,7 @@ The spark-submit script described in the [cluster mode overview](cluster-overvie
172194
Keep in mind that your driver program will be executed on a remote worker machine. You can control the execution environment in the following ways:
173195

174196
* _Environment variables_: These will be captured from the environment in which you launch the client and applied when launching the driver program.
175-
* _Java options_: You can add java options by setting `SPARK_JAVA_OPTS` in the environment in which you launch the submission client.
197+
* _Java options_: You can add java options by setting `SPARK_JAVA_OPTS` in the environment in which you launch the submission client. (Note: as of Spark 1.0, spark options should be specified through `conf/spark-defaults.conf`, which is only loaded through spark-submit.)
176198
* _Dependencies_: You'll still need to call `sc.addJar` inside of your program to make your bundled application jar visible on all worker nodes.
177199

178200
Once you submit a driver program, it will appear in the cluster management UI at port 8080 and

0 commit comments

Comments
 (0)