Skip to content

Commit 32f86a4

Browse files
committed
use configs for specifying environment variables on YARN
1 parent ac3440f commit 32f86a4

File tree

4 files changed

+43
-6
lines changed

4 files changed

+43
-6
lines changed

docs/configuration.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,14 @@ Apart from these, the following properties are also available, and may be useful
206206
used during aggregation goes above this amount, it will spill the data into disks.
207207
</td>
208208
</tr>
209+
<tr>
210+
<td><code>spark.executorEnv.<Environment Variable Name></code></td>
211+
<td>(none)</td>
212+
<td>
213+
Add the environment variable specified by <Environment Variable Name> to the Executor
214+
process. The user can specify multiple of these and to set multiple environment variables.
215+
</td>
216+
</tr>
209217
</table>
210218

211219
#### Shuffle Behavior

docs/running-on-yarn.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,6 @@ To build Spark yourself, refer to the [building with Maven guide](building-with-
1717

1818
Most of the configs are the same for Spark on YARN as for other deployment modes. See the [configuration page](configuration.html) for more information on those. These are configs that are specific to Spark on YARN.
1919

20-
#### Environment Variables
21-
22-
* `SPARK_YARN_USER_ENV`, to add environment variables to the Spark processes launched on YARN. This can be a comma separated list of environment variables, e.g. `SPARK_YARN_USER_ENV="JAVA_HOME=/jdk64,FOO=bar"`.
23-
2420
#### Spark Properties
2521

2622
<table class="table">
@@ -110,7 +106,23 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
110106
<td><code>spark.yarn.access.namenodes</code></td>
111107
<td>(none)</td>
112108
<td>
113-
A list of secure HDFS namenodes your Spark application is going to access. For example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. The Spark application must have acess to the namenodes listed and Kerberos must be properly configured to be able to access them (either in the same realm or in a trusted realm). Spark acquires security tokens for each of the namenodes so that the Spark application can access those remote HDFS clusters.
109+
A list of secure HDFS namenodes your Spark application is going to access. For
110+
example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`.
111+
The Spark application must have acess to the namenodes listed and Kerberos must
112+
be properly configured to be able to access them (either in the same realm or in
113+
a trusted realm). Spark acquires security tokens for each of the namenodes so that
114+
the Spark application can access those remote HDFS clusters.
115+
</td>
116+
</tr>
117+
<tr>
118+
<td><code>spark.yarn.appMasterEnv.<Environment Variable Name></code></td>
119+
<td>(none)</td>
120+
<td>
121+
Add the environment variable specified by <Environment Variable Name> to the
122+
Application Master process launched on YARN. The user can specify multiple of
123+
these and to set multiple environment variables. In yarn-cluster mode this controls
124+
the environment of the SPARK driver and in yarn-client mode it only controls
125+
the environment of the executor launcher.
114126
</td>
115127
</tr>
116128
</table>

yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,14 @@ trait ClientBase extends Logging {
259259
localResources
260260
}
261261

262+
/** Get all application master environment variables set on this SparkConf */
263+
def getAppMasterEnv: Seq[(String, String)] = {
264+
val prefix = "spark.yarn.appMasterEnv."
265+
sparkConf.getAll.filter{case (k, v) => k.startsWith(prefix)}
266+
.map{case (k, v) => (k.substring(prefix.length), v)}
267+
}
268+
269+
262270
def setupLaunchEnv(
263271
localResources: HashMap[String, LocalResource],
264272
stagingDir: String): HashMap[String, String] = {
@@ -276,6 +284,11 @@ trait ClientBase extends Logging {
276284
distCacheMgr.setDistFilesEnv(env)
277285
distCacheMgr.setDistArchivesEnv(env)
278286

287+
getAppMasterEnv.foreach { case (key, value) =>
288+
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
289+
}
290+
291+
// Keep this for backwards compatibility but users should move to the config
279292
sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
280293
// Allow users to specify some environment variables.
281294
YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs, File.pathSeparator)

yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,11 @@ trait ExecutorRunnableUtil extends Logging {
171171
val extraCp = sparkConf.getOption("spark.executor.extraClassPath")
172172
ClientBase.populateClasspath(null, yarnConf, sparkConf, env, extraCp)
173173

174-
// Allow users to specify some environment variables
174+
sparkConf.getExecutorEnv.foreach { case (key, value) =>
175+
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
176+
}
177+
178+
// Keep this for backwards compatibility but users should move to the config
175179
YarnSparkHadoopUtil.setEnvFromInputString(env, System.getenv("SPARK_YARN_USER_ENV"),
176180
File.pathSeparator)
177181

0 commit comments

Comments
 (0)