Skip to content

Commit e5c682d

Browse files
author
Marcelo Vanzin
committed
Fix cluster mode, restore SPARK_LOG4J_CONF.
Also add documentation about logging to the Yarn guide. In cluster mode, the change modifies some code added in fb98488 to treat both client and cluster modes as mostly the same. Previously, cluster mode was only forwarding system properties that started with "spark", which caused it to ignore anything that SparkSubmit sets directly in the SparkConf object.
1 parent 1dfbb40 commit e5c682d

File tree

2 files changed

+32
-18
lines changed

2 files changed

+32
-18
lines changed

docs/running-on-yarn.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,20 @@ all environment variables used for launching each container. This process is use
165165
classpath problems in particular. (Note that enabling this requires admin privileges on cluster
166166
settings and a restart of all node managers. Thus, this is not applicable to hosted clusters).
167167

168-
# Important Notes
168+
To use a custom log4j configuration for the application master or executors, there are two options:
169+
170+
- upload a custom log4j.properties using spark-submit, by adding it to the "--files" list of files
171+
to be uploaded with the application.
172+
- add "-Dlog4j.configuration=<location of configuration file>" to "spark.driver.extraJavaOptions"
173+
(for the driver) or "spark.executor.extraJavaOptions" (for executors). Note that if using a file,
174+
the "file:" protocol should be explicitly provided, and the file needs to exist locally on all
175+
the nodes.
176+
177+
Note that for the first option, both executors and the application master will share the same
178+
log4j configuration, which may cause issues when they run on the same node (e.g. trying to write
179+
to the same log file).
180+
181+
# Important notes
169182

170183
- Before Hadoop 2.2, YARN does not support cores in container resource requests. Thus, when running against an earlier version, the numbers of cores given via command line arguments cannot be passed to YARN. Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.
171184
- The local directories used by Spark executors will be the local directories configured for YARN (Hadoop YARN config `yarn.nodemanager.local-dirs`). If the user specifies `spark.local.dir`, it will be ignored.

yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -213,9 +213,18 @@ trait ClientBase extends Logging {
213213

214214
val statCache: Map[URI, FileStatus] = HashMap[URI, FileStatus]()
215215

216+
val oldLog4jConf = Option(System.getenv("SPARK_LOG4J_CONF"))
217+
if (oldLog4jConf.isDefined) {
218+
logWarning(
219+
"SPARK_LOG4J_CONF detected in the system environment. This variable has been " +
220+
"deprecated. Please refer to the \"Launching Spark on YARN\" documentation " +
221+
"for alternatives.")
222+
}
223+
216224
List(
217225
(ClientBase.SPARK_JAR, ClientBase.sparkJar(sparkConf), ClientBase.CONF_SPARK_JAR),
218-
(ClientBase.APP_JAR, args.userJar, ClientBase.CONF_SPARK_USER_JAR)
226+
(ClientBase.APP_JAR, args.userJar, ClientBase.CONF_SPARK_USER_JAR),
227+
("log4j.properties", oldLog4jConf.getOrElse(null), null)
219228
).foreach { case(destName, _localPath, confKey) =>
220229
val localPath: String = if (_localPath != null) _localPath.trim() else ""
221230
if (! localPath.isEmpty()) {
@@ -225,7 +234,7 @@ trait ClientBase extends Logging {
225234
val destPath = copyRemoteFile(dst, qualifyForLocal(localURI), replication, setPermissions)
226235
distCacheMgr.addResource(fs, conf, destPath, localResources, LocalResourceType.FILE,
227236
destName, statCache)
228-
} else {
237+
} else if (confKey != null) {
229238
sparkConf.set(confKey, localPath)
230239
}
231240
}
@@ -348,20 +357,13 @@ trait ClientBase extends Logging {
348357
sparkConf.set("spark.driver.extraJavaOptions", opts)
349358
}
350359

360+
// Forward the Spark configuration to the application master / executors.
351361
// TODO: it might be nicer to pass these as an internal environment variable rather than
352362
// as Java options, due to complications with string parsing of nested quotes.
353-
if (args.amClass == classOf[ExecutorLauncher].getName) {
354-
// If we are being launched in client mode, forward the spark-conf options
355-
// onto the executor launcher
356-
for ((k, v) <- sparkConf.getAll) {
357-
javaOpts += "-D" + k + "=" + "\\\"" + v + "\\\""
358-
}
359-
} else {
360-
// If we are being launched in standalone mode, capture and forward any spark
361-
// system properties (e.g. set by spark-class).
362-
for ((k, v) <- sys.props.filterKeys(_.startsWith("spark"))) {
363-
javaOpts += "-D" + k + "=" + "\\\"" + v + "\\\""
364-
}
363+
for ((k, v) <- sparkConf.getAll) {
364+
javaOpts += "-D" + k + "=" + "\\\"" + v + "\\\""
365+
}
366+
if (args.amClass == classOf[ApplicationMaster].getName) {
365367
sys.props.get("spark.driver.extraJavaOptions").foreach(opts => javaOpts += opts)
366368
sys.props.get("spark.driver.libraryPath").foreach(p => javaOpts += s"-Djava.library.path=$p")
367369
}
@@ -522,11 +524,10 @@ object ClientBase extends Logging {
522524
}
523525
}
524526
} else {
525-
val userJar = conf.getOption(CONF_SPARK_USER_JAR).getOrElse(null)
527+
val userJar = conf.get(CONF_SPARK_USER_JAR, null)
526528
addFileToClasspath(userJar, APP_JAR, env)
527529

528-
val cachedSecondaryJarLinks =
529-
conf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse("").split(",")
530+
val cachedSecondaryJarLinks = conf.get(CONF_SPARK_YARN_SECONDARY_JARS, "").split(",")
530531
cachedSecondaryJarLinks.foreach(jar => addFileToClasspath(jar, null, env))
531532
}
532533
}

0 commit comments

Comments
 (0)