Skip to content

Conversation

@jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors.

To reproduce issue, user could start spark-shell like:

./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf  spark.yarn.appMasterEnv.SPARK_ABC=am_val

Then check executor env variables by

sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println)

We will always get am_val instead of executor_val. So we should not let AM to overwrite specifically set executor env variables.

How was this patch tested?

Added UT and tested in local cluster.

@SparkQA
Copy link

SparkQA commented Mar 12, 2018

Test build #88168 has finished for PR 20799 at commit afd88e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

@mridulm , would you please take a review. Thanks!

@mridulm
Copy link
Contributor

mridulm commented Mar 13, 2018

Thanks for fixing this @jerryshao !
LGTM
Since there is a behavior change, +CC @tgravescs and @vanzin

.foreach { case (k, v) => env(k) = v }

sparkConf.getExecutorEnv.foreach { case (key, value) =>
if (key == Environment.CLASSPATH.name()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are other variables that requires the append, like LD_LIBRARY_PATH. It obviously can do bad things if its specified twice though but generally you shouldn't be doing that.

Maybe I missed it in the description exactly how is AM env taking presendence? Can you please describe the flow as to why moving this or changing it fixes the bug (how is executor getting am env now)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgravescs In existing code, in prepareEnvironment, "env" is populated only with Environment.CLASSPATH. Hence LD_LIBRARY_PATH does not apply to this specific change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see sorry missed that. So I guess here we are just stomping on whatever is in the system env path now, vs before we were stomping on the executorEnv specified with the system env.

// For other env variables, simply overwrite the value.
env(key) = value
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryshao I think there is a potential issue with this change - it allows for users to (incorrectly) specify SPARK_LOG_URL_STDERR, SPARK_LOG_URL_STDOUT : which should be generated by driver. The section "// Add log urls" above this code snippet.

Note, this is an existing bug in the code regarding the same - if the same variables had been present in driver env, they would have overridden the generated value's.
Would be good to fix this issue as well as part of this change.

Solution would be to move the block for '// Add log urls' below this current block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will fix it, thanks for the reminder.

@SparkQA
Copy link

SparkQA commented Mar 14, 2018

Test build #88219 has finished for PR 20799 at commit 2afc91e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mridulm
Copy link
Contributor

mridulm commented Mar 15, 2018

Thanks for the changes @jerryshao, they look good to me.
Do you have any other thoughts/comments @tgravescs ? Since this is a behavior change, I want to make sure we are not missing anything !

@tgravescs
Copy link
Contributor

looks good to me

@jerryshao
Copy link
Contributor Author

Thanks for the review, let me merge to master.

@asfgit asfgit closed this in c952000 Mar 16, 2018
mstewart141 pushed a commit to mstewart141/spark that referenced this pull request Mar 24, 2018
…v variable set through spark.executorEnv.

## What changes were proposed in this pull request?

In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors.

To reproduce issue, user could start spark-shell like:

```
./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf  spark.yarn.appMasterEnv.SPARK_ABC=am_val
```

Then check executor env variables by

```
sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println)
```

We will always get `am_val` instead of `executor_val`. So we should not let AM to overwrite specifically set executor env variables.

## How was this patch tested?

Added UT and tested in local cluster.

Author: jerryshao <[email protected]>

Closes apache#20799 from jerryshao/SPARK-23635.
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
…v variable set through spark.executorEnv.

In the current Spark on YARN code, AM always will copy and overwrite its env variables to executors, so we cannot set different values for executors.

To reproduce issue, user could start spark-shell like:

```
./bin/spark-shell --master yarn-client --conf spark.executorEnv.SPARK_ABC=executor_val --conf  spark.yarn.appMasterEnv.SPARK_ABC=am_val
```

Then check executor env variables by

```
sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq }.collect.foreach(println)
```

We will always get `am_val` instead of `executor_val`. So we should not let AM to overwrite specifically set executor env variables.

Added UT and tested in local cluster.

Author: jerryshao <[email protected]>

Closes apache#20799 from jerryshao/SPARK-23635.

Change-Id: Iffe379783426886ea8b529ba6fd5a97c54c6b258
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants