[SPARK-23635][YARN] AM env variable should not overwrite same name env variable set through spark.executorEnv. #20799

jerryshao · 2018-03-12T03:33:44Z

What changes were proposed in this pull request?

In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors.

To reproduce issue, user could start spark-shell like:

./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf  spark.yarn.appMasterEnv.SPARK_ABC=am_val

Then check executor env variables by

sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println)

We will always get am_val instead of executor_val. So we should not let AM to overwrite specifically set executor env variables.

How was this patch tested?

Added UT and tested in local cluster.

…gh spark.executorEnv.

SparkQA · 2018-03-12T03:57:24Z

Test build #88168 has finished for PR 20799 at commit afd88e1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2018-03-12T06:25:01Z

@mridulm , would you please take a review. Thanks!

mridulm · 2018-03-13T07:50:09Z

Thanks for fixing this @jerryshao !
LGTM
Since there is a behavior change, +CC @tgravescs and @vanzin

tgravescs · 2018-03-13T13:15:39Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala

      .foreach { case (k, v) => env(k) = v }
+
+    sparkConf.getExecutorEnv.foreach { case (key, value) =>
+      if (key == Environment.CLASSPATH.name()) {


there are other variables that requires the append, like LD_LIBRARY_PATH. It obviously can do bad things if its specified twice though but generally you shouldn't be doing that.

Maybe I missed it in the description exactly how is AM env taking presendence? Can you please describe the flow as to why moving this or changing it fixes the bug (how is executor getting am env now)?

@tgravescs In existing code, in prepareEnvironment, "env" is populated only with Environment.CLASSPATH. Hence LD_LIBRARY_PATH does not apply to this specific change.

Ah, I see sorry missed that. So I guess here we are just stomping on whatever is in the system env path now, vs before we were stomping on the executorEnv specified with the system env.

mridulm · 2018-03-13T20:30:17Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala

+        // For other env variables, simply overwrite the value.
+        env(key) = value
+      }
+    }


@jerryshao I think there is a potential issue with this change - it allows for users to (incorrectly) specify SPARK_LOG_URL_STDERR, SPARK_LOG_URL_STDOUT : which should be generated by driver. The section "// Add log urls" above this code snippet.

Note, this is an existing bug in the code regarding the same - if the same variables had been present in driver env, they would have overridden the generated value's.
Would be good to fix this issue as well as part of this change.

Solution would be to move the block for '// Add log urls' below this current block

Sure, I will fix it, thanks for the reminder.

SparkQA · 2018-03-14T02:17:40Z

Test build #88219 has finished for PR 20799 at commit 2afc91e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mridulm · 2018-03-15T21:17:50Z

Thanks for the changes @jerryshao, they look good to me.
Do you have any other thoughts/comments @tgravescs ? Since this is a behavior change, I want to make sure we are not missing anything !

tgravescs · 2018-03-16T00:44:22Z

looks good to me

jerryshao · 2018-03-16T08:20:50Z

Thanks for the review, let me merge to master.

…v variable set through spark.executorEnv. ## What changes were proposed in this pull request? In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors. To reproduce issue, user could start spark-shell like: ``` ./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf spark.yarn.appMasterEnv.SPARK_ABC=am_val ``` Then check executor env variables by ``` sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println) ``` We will always get `am_val` instead of `executor_val`. So we should not let AM to overwrite specifically set executor env variables. ## How was this patch tested? Added UT and tested in local cluster. Author: jerryshao <[email protected]> Closes apache#20799 from jerryshao/SPARK-23635.

…v variable set through spark.executorEnv. In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors. To reproduce issue, user could start spark-shell like: ``` ./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf spark.yarn.appMasterEnv.SPARK_ABC=am_val ``` Then check executor env variables by ``` sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println) ``` We will always get `am_val` instead of `executor_val`. So we should not let AM to overwrite specifically set executor env variables. Added UT and tested in local cluster. Author: jerryshao <[email protected]> Closes apache#20799 from jerryshao/SPARK-23635. Change-Id: Iffe379783426886ea8b529ba6fd5a97c54c6b258

AM env variable should not overwrite same name env variable set throu…

afd88e1

…gh spark.executorEnv.

tgravescs reviewed Mar 13, 2018

View reviewed changes

mridulm reviewed Mar 13, 2018

View reviewed changes

Address the comments

2afc91e

asfgit closed this in c952000 Mar 16, 2018

[SPARK-23635][YARN] AM env variable should not overwrite same name env variable set through spark.executorEnv. #20799

[SPARK-23635][YARN] AM env variable should not overwrite same name env variable set through spark.executorEnv. #20799

Uh oh!

Conversation

jerryshao commented Mar 12, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 12, 2018

Uh oh!

jerryshao commented Mar 12, 2018

Uh oh!

mridulm commented Mar 13, 2018

Uh oh!

tgravescs Mar 13, 2018

Choose a reason for hiding this comment

Uh oh!

mridulm Mar 13, 2018

Choose a reason for hiding this comment

Uh oh!

tgravescs Mar 13, 2018

Choose a reason for hiding this comment

Uh oh!

mridulm Mar 13, 2018

Choose a reason for hiding this comment

Uh oh!

jerryshao Mar 14, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 14, 2018

Uh oh!

mridulm commented Mar 15, 2018

Uh oh!

tgravescs commented Mar 16, 2018

Uh oh!

jerryshao commented Mar 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants