Commit 91575ca
[SPARK-16540][YARN][CORE] Avoid adding jars twice for Spark running on yarn
## What changes were proposed in this pull request?
Currently when running spark on yarn, jars specified with --jars, --packages will be added twice, one is Spark's own file server, another is yarn's distributed cache, this can be seen from log:
for example:
```
./bin/spark-shell --master yarn-client --jars examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
```
If specified the jar to be added is scopt jar, it will added twice:
```
...
16/07/14 15:06:48 INFO Server: Started 5603ms
16/07/14 15:06:48 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/14 15:06:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.102:4040
16/07/14 15:06:48 INFO SparkContext: Added JAR file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar at spark://192.168.0.102:63996/jars/scopt_2.11-3.3.0.jar with timestamp 1468480008637
16/07/14 15:06:49 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/07/14 15:06:49 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/07/14 15:06:49 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/07/14 15:06:49 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/07/14 15:06:49 INFO Client: Setting up container launch context for our AM
16/07/14 15:06:49 INFO Client: Setting up the launch environment for our AM container
16/07/14 15:06:49 INFO Client: Preparing resources for our AM container
16/07/14 15:06:49 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/07/14 15:06:50 INFO Client: Uploading resource file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_libs__6486179704064718817.zip -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_libs__6486179704064718817.zip
16/07/14 15:06:51 INFO Client: Uploading resource file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/scopt_2.11-3.3.0.jar
16/07/14 15:06:51 INFO Client: Uploading resource file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_conf__326416236462420861.zip -> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_conf__.zip
...
```
So here try to avoid adding jars to Spark's fileserver unnecessarily.
## How was this patch tested?
Manually verified both in yarn client and cluster mode, also in standalone mode.
Author: jerryshao <[email protected]>
Closes #14196 from jerryshao/SPARK-16540.1 parent 31ca741 commit 91575ca
File tree
3 files changed
+4
-4
lines changed- core/src/main/scala/org/apache/spark/util
- repl
- scala-2.10/src/main/scala/org/apache/spark/repl
- scala-2.11/src/main/scala/org/apache/spark/repl
3 files changed
+4
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2409 | 2409 | | |
2410 | 2410 | | |
2411 | 2411 | | |
2412 | | - | |
| 2412 | + | |
2413 | 2413 | | |
2414 | | - | |
| 2414 | + | |
2415 | 2415 | | |
2416 | 2416 | | |
2417 | 2417 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1066 | 1066 | | |
1067 | 1067 | | |
1068 | 1068 | | |
1069 | | - | |
| 1069 | + | |
1070 | 1070 | | |
1071 | 1071 | | |
1072 | 1072 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| |||
0 commit comments