-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23635][YARN] AM env variable should not overwrite same name env variable set through spark.executorEnv. #20799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…gh spark.executorEnv.
|
Test build #88168 has finished for PR 20799 at commit
|
|
@mridulm , would you please take a review. Thanks! |
|
Thanks for fixing this @jerryshao ! |
| .foreach { case (k, v) => env(k) = v } | ||
|
|
||
| sparkConf.getExecutorEnv.foreach { case (key, value) => | ||
| if (key == Environment.CLASSPATH.name()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are other variables that requires the append, like LD_LIBRARY_PATH. It obviously can do bad things if its specified twice though but generally you shouldn't be doing that.
Maybe I missed it in the description exactly how is AM env taking presendence? Can you please describe the flow as to why moving this or changing it fixes the bug (how is executor getting am env now)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgravescs In existing code, in prepareEnvironment, "env" is populated only with Environment.CLASSPATH. Hence LD_LIBRARY_PATH does not apply to this specific change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see sorry missed that. So I guess here we are just stomping on whatever is in the system env path now, vs before we were stomping on the executorEnv specified with the system env.
| // For other env variables, simply overwrite the value. | ||
| env(key) = value | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryshao I think there is a potential issue with this change - it allows for users to (incorrectly) specify SPARK_LOG_URL_STDERR, SPARK_LOG_URL_STDOUT : which should be generated by driver. The section "// Add log urls" above this code snippet.
Note, this is an existing bug in the code regarding the same - if the same variables had been present in driver env, they would have overridden the generated value's.
Would be good to fix this issue as well as part of this change.
Solution would be to move the block for '// Add log urls' below this current block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will fix it, thanks for the reminder.
|
Test build #88219 has finished for PR 20799 at commit
|
|
Thanks for the changes @jerryshao, they look good to me. |
|
looks good to me |
|
Thanks for the review, let me merge to master. |
…v variable set through spark.executorEnv.
## What changes were proposed in this pull request?
In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors.
To reproduce issue, user could start spark-shell like:
```
./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf spark.yarn.appMasterEnv.SPARK_ABC=am_val
```
Then check executor env variables by
```
sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println)
```
We will always get `am_val` instead of `executor_val`. So we should not let AM to overwrite specifically set executor env variables.
## How was this patch tested?
Added UT and tested in local cluster.
Author: jerryshao <[email protected]>
Closes apache#20799 from jerryshao/SPARK-23635.
…v variable set through spark.executorEnv.
In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors.
To reproduce issue, user could start spark-shell like:
```
./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf spark.yarn.appMasterEnv.SPARK_ABC=am_val
```
Then check executor env variables by
```
sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println)
```
We will always get `am_val` instead of `executor_val`. So we should not let AM to overwrite specifically set executor env variables.
Added UT and tested in local cluster.
Author: jerryshao <[email protected]>
Closes apache#20799 from jerryshao/SPARK-23635.
Change-Id: Iffe379783426886ea8b529ba6fd5a97c54c6b258
What changes were proposed in this pull request?
In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors.
To reproduce issue, user could start spark-shell like:
Then check executor env variables by
We will always get
am_valinstead ofexecutor_val. So we should not let AM to overwrite specifically set executor env variables.How was this patch tested?
Added UT and tested in local cluster.