You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/spark-standalone.md
+40-18Lines changed: 40 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -70,7 +70,7 @@ Once you've set up this file, you can launch or stop your cluster with the follo
70
70
-`sbin/start-slaves.sh` - Starts a slave instance on each machine specified in the `conf/slaves` file.
71
71
-`sbin/start-all.sh` - Starts both a master and a number of slaves as described above.
72
72
-`sbin/stop-master.sh` - Stops the master that was started via the `bin/start-master.sh` script.
73
-
-`sbin/stop-slaves.sh` - Stops the slave instances that were started via `bin/start-slaves.sh`.
73
+
-`sbin/stop-slaves.sh` - Stops all slave instances the machines specified in the `conf/slaves` file.
74
74
-`sbin/stop-all.sh` - Stops both the master and the slaves as described above.
75
75
76
76
Note that these scripts must be executed on the machine you want to run the Spark master on, not your local machine.
@@ -92,12 +92,8 @@ You can optionally configure the cluster further by setting environment variable
92
92
<td>Port for the master web UI (default: 8080).</td>
93
93
</tr>
94
94
<tr>
95
-
<td><code>SPARK_WORKER_PORT</code></td>
96
-
<td>Start the Spark worker on a specific port (default: random).</td>
97
-
</tr>
98
-
<tr>
99
-
<td><code>SPARK_WORKER_DIR</code></td>
100
-
<td>Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work).</td>
95
+
<td><code>SPARK_MASTER_OPTS</code></td>
96
+
<td>Configuration properties that apply only to the master in the form "-Dx=y" (default: none).</td>
101
97
</tr>
102
98
<tr>
103
99
<td><code>SPARK_WORKER_CORES</code></td>
@@ -107,6 +103,10 @@ You can optionally configure the cluster further by setting environment variable
107
103
<td><code>SPARK_WORKER_MEMORY</code></td>
108
104
<td>Total amount of memory to allow Spark applications to use on the machine, e.g. <code>1000m</code>, <code>2g</code> (default: total memory minus 1 GB); note that each application's <i>individual</i> memory is configured using its <code>spark.executor.memory</code> property.</td>
109
105
</tr>
106
+
<tr>
107
+
<td><code>SPARK_WORKER_PORT</code></td>
108
+
<td>Start the Spark worker on a specific port (default: random).</td>
109
+
</tr>
110
110
<tr>
111
111
<td><code>SPARK_WORKER_WEBUI_PORT</code></td>
112
112
<td>Port for the worker web UI (default: 8081).</td>
@@ -120,13 +120,25 @@ You can optionally configure the cluster further by setting environment variable
120
120
or else each worker will try to use all the cores.
121
121
</td>
122
122
</tr>
123
+
<tr>
124
+
<td><code>SPARK_WORKER_DIR</code></td>
125
+
<td>Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work).</td>
126
+
</tr>
127
+
<tr>
128
+
<td><code>SPARK_WORKER_OPTS</code></td>
129
+
<td>Configuration properties that apply only to the worker in the form "-Dx=y" (default: none).</td>
130
+
</tr>
123
131
<tr>
124
132
<td><code>SPARK_DAEMON_MEMORY</code></td>
125
133
<td>Memory to allocate to the Spark master and worker daemons themselves (default: 512m).</td>
126
134
</tr>
127
135
<tr>
128
136
<td><code>SPARK_DAEMON_JAVA_OPTS</code></td>
129
-
<td>JVM options for the Spark master and worker daemons themselves (default: none).</td>
137
+
<td>JVM options for the Spark master and worker daemons themselves in the form "-Dx=y" (default: none).</td>
138
+
</tr>
139
+
<tr>
140
+
<td><code>SPARK_PUBLIC_DNS</code></td>
141
+
<td>The public DNS name of the Spark master and workers (default: none).</td>
130
142
</tr>
131
143
</table>
132
144
@@ -150,20 +162,30 @@ You can also pass an option `--cores <numCores>` to control the number of cores
150
162
151
163
Spark supports two deploy modes. Spark applications may run with the driver inside the client process or entirely inside the cluster.
152
164
153
-
The spark-submit script described in the [cluster mode overview](cluster-overview.html) provides the most straightforward way to submit a compiled Spark application to the cluster in either deploy mode. For info on the lower-level invocations used to launch an app inside the cluster, read ahead.
165
+
The spark-submit script provides the most straightforward way to submit a compiled Spark application to the cluster in either deploy mode. For more detail, see the [cluster mode overview](cluster-overview.html).
166
+
167
+
./bin/spark-submit \
168
+
--class <main-class>
169
+
--master <master-url> \
170
+
--deploy-mode <deploy-mode> \
171
+
... // other options
172
+
<application-jar>
173
+
[application-arguments]
154
174
155
-
## Launching Applications Inside the Cluster
175
+
main-class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
176
+
master-url: The URL of the master node (e.g. spark://23.195.26.187:7077)
177
+
deploy-mode: Whether to deploy this application within the cluster or from an external client (e.g. client)
178
+
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
179
+
application-arguments: Arguments passed to the main method of <main-class>
180
+
181
+
Behind the scenes, this invokes the standalone Client to launch your application, which is also the legacy way to launch your application before Spark 1.0.
application-jar-url: Path to a bundled jar including your application and all dependencies. Currently, the URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
164
-
main-class: The entry point for your application.
185
+
<master-url> <application-jar> <main-class> \
186
+
[application-arguments]
165
187
166
-
Client Options:
188
+
client-options:
167
189
--memory <count> (amount of memory, in MB, allocated for your driver program)
168
190
--cores <count> (number of cores allocated for your driver program)
169
191
--supervise (whether to automatically restart your driver on application or node failure)
@@ -172,7 +194,7 @@ The spark-submit script described in the [cluster mode overview](cluster-overvie
172
194
Keep in mind that your driver program will be executed on a remote worker machine. You can control the execution environment in the following ways:
173
195
174
196
*_Environment variables_: These will be captured from the environment in which you launch the client and applied when launching the driver program.
175
-
*_Java options_: You can add java options by setting `SPARK_JAVA_OPTS` in the environment in which you launch the submission client.
197
+
*_Java options_: You can add java options by setting `SPARK_JAVA_OPTS` in the environment in which you launch the submission client. (Note: as of Spark 1.0, spark options should be specified through `conf/spark-defaults.conf`, which is only loaded through spark-submit.)
176
198
*_Dependencies_: You'll still need to call `sc.addJar` inside of your program to make your bundled application jar visible on all worker nodes.
177
199
178
200
Once you submit a driver program, it will appear in the cluster management UI at port 8080 and
0 commit comments