You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
was added to Spark in version 0.6.0, and improved in subsequent releases.
23
9
24
10
# Preparations
25
11
26
-
- Building a YARN-enabled assembly (see above).
27
-
- The assembled jar can be installed into HDFS or used locally.
28
-
- Your application code must be packaged into a separate JAR file.
29
-
30
-
If you want to test out the YARN deployment mode, you can use the current Spark examples. A `spark-examples_{{site.SCALA_BINARY_VERSION}}-{{site.SPARK_VERSION}}` file can be generated by running `sbt/sbt assembly`. NOTE: since the documentation you're reading is for Spark version {{site.SPARK_VERSION}}, we are assuming here that you have downloaded Spark {{site.SPARK_VERSION}} or checked it out of source control. If you are using a different version of Spark, the version numbers in the jar generated by the sbt package command will obviously be different.
12
+
We need a consolidated Spark JAR (which bundles all the required dependencies) to run Spark jobs on a YARN cluster. The jar must be built with options that enable YARN support. To build this jar yourself, refer to the [building with maven guide](building-with-maven.html).
31
13
32
14
# Configuration
33
15
@@ -46,10 +28,12 @@ System Properties:
46
28
*`spark.yarn.max.executor.failures`, the maximum number of executor failures before failing the application. Default is the number of executors requested times 2 with minimum of 3.
47
29
*`spark.yarn.historyServer.address`, the address of the Spark history server (i.e. host.com:18080). The address should not contain a scheme (http://). Defaults to not being set since the history server is an optional service. This address is given to the Yarn ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI.
48
30
31
+
By default, Spark on YARN will use a Spark jar installed locally, but the Spark jar can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs. To point to a jar on HDFS, export SPARK_JAR=hdfs:/some/path.
32
+
49
33
# Launching Spark on YARN
50
34
51
35
Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster.
52
-
These configs are used to write to the dfs, and connect to the YARN ResourceManager.
36
+
These configs are used to write to the dfs and connect to the YARN ResourceManager.
53
37
54
38
There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
0 commit comments