-
Notifications
You must be signed in to change notification settings - Fork 117
Allow configuration to set environment variables on driver and executor #395
Allow configuration to set environment variables on driver and executor #395
Conversation
…or as below: --conf spark.executorEnv.[EnvironmentVariableName] --conf spark.driverEnv.[EnvironmentVariableName]
|
Please review. It could check the custom environment variables in case conflicts with internal spark constants envs. But I think it maybe not necessary. |
|
|
||
| val driverCustomEnvs = submissionSparkConf.getAllWithPrefix(KUBERNETES_DRIVER_ENV_KEY).toSeq | ||
| .map(env => new EnvVarBuilder() | ||
| .withName(env._1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indention looks like it is a chained call following map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, will change that.
docs/running-on-kubernetes.md
Outdated
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.driverEnv.[EnvironmentVariableName]</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Yarn cluster mode, there's a similar property named spark.yarn.appMasterEnv.[EnvironmentVariableName]. Since this is specific to Kubernetes, should we rename it to spark.kubernetes.driverEnv[EnvironmentVariableName]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. sure.
| .stringConf | ||
| .createOptional | ||
|
|
||
| private[spark] val KUBERNETES_DRIVER_ENV_KEY = "spark.driverEnv." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a corresponding KUBERNETES_EXECUTOR_ENV_KEY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Properties with prefix spark.executorEnv. are handled natively by Spark. http://spark.apache.org/docs/latest/configuration.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this can be handled in a generic way across all cluster managers. Where is the code that handles this in spark-core? Otherwise we should set the environment variables specifically on the executor container in KubernetesClusterSchedulerBackend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah never mind - this is confusing because the driver env requires a new Kubernetes-specific configuration but the executors will use an existing configuration from spark-core. But even though the configuration key is defined for spark-core and is consistent across all cluster managers, the implementation is different for each cluster manager - and we do handle this in KubernetesClusterSchedulerBackend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Driver configuration key is different but executor is the same across cluster manager.
docs/running-on-kubernetes.md
Outdated
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.driverEnv.[EnvironmentVariableName]</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix the configuration with spark.kubernetes instead of just spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
|
LGTM. |
|
LGTM - should the |
|
Yeah, indeed. The logic of configuration value parsing into sparkContext was guaranteed by spark core. |
|
@liyinan926 @erikerlandson @mccheah Any other concern of merging this basic feature? |
|
none for me. I'd be OK to merge it for the upcoming release unless others have concerns |
|
No concern from me. |
|
@tangzhankun can we rebase this to |
|
@erikerlandson Pulled a new request based on branch-2.2-kubernetes here: #424 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh just saw that this moved to #424 and merged. I'll file the below comments as an issue so we don't forget.
Thanks for the contribution @tangzhankun !
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the equivalent of YARN's spark.yarn.appMasterEnv.[EnvironmentVariableName]
| </td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.executorEnv.[EnvironmentVariableName]</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already doc'd in docs/configuration.md -- please remove this duplicate config from docs/running-on-kubernetes.md and reference it in the spark.kubernetes.driverEnv.[EnvironmentVariableName] section below with language something like For executors, use the generic spark.executorEnv.[EnvironmentVariableName] which works across cluster managers
| (ENV_APPLICATION_ID, applicationId()), | ||
| (ENV_EXECUTOR_ID, executorId), | ||
| (ENV_MOUNTED_CLASSPATH, s"$executorJarsDownloadDir/*")) | ||
| (ENV_MOUNTED_CLASSPATH, s"$executorJarsDownloadDir/*")) ++ sc.executorEnvs.toSeq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if a user specifies an executorEnv that conflicts with one of the above? I think we want to prohibit users from overriding the Spark-internal settings and throw an exception if they attempt to do that
What changes were proposed in this pull request?
Two configurations added as below to support setting environment variables into driver and executors:
--conf spark.executorEnv.[EnvironmentVariableName]
--conf spark.kubernetes.driverEnv.[EnvironmentVariableName]
How was this patch tested?
For driver, unit test case is added.
For both driver and executor, manually integration test against the k8s cluster is performed.