Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@tangzhankun
Copy link

@tangzhankun tangzhankun commented Jul 27, 2017

What changes were proposed in this pull request?

Two configurations added as below to support setting environment variables into driver and executors:

--conf spark.executorEnv.[EnvironmentVariableName]
--conf spark.kubernetes.driverEnv.[EnvironmentVariableName]

How was this patch tested?

For driver, unit test case is added.
For both driver and executor, manually integration test against the k8s cluster is performed.

…or as below:

--conf spark.executorEnv.[EnvironmentVariableName]
--conf spark.driverEnv.[EnvironmentVariableName]
@tangzhankun
Copy link
Author

Please review. It could check the custom environment variables in case conflicts with internal spark constants envs. But I think it maybe not necessary.


val driverCustomEnvs = submissionSparkConf.getAllWithPrefix(KUBERNETES_DRIVER_ENV_KEY).toSeq
.map(env => new EnvVarBuilder()
.withName(env._1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indention looks like it is a chained call following map.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, will change that.

</td>
</tr>
<tr>
<td><code>spark.driverEnv.[EnvironmentVariableName]</code></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Yarn cluster mode, there's a similar property named spark.yarn.appMasterEnv.[EnvironmentVariableName]. Since this is specific to Kubernetes, should we rename it to spark.kubernetes.driverEnv[EnvironmentVariableName]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. sure.

.stringConf
.createOptional

private[spark] val KUBERNETES_DRIVER_ENV_KEY = "spark.driverEnv."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a corresponding KUBERNETES_EXECUTOR_ENV_KEY

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties with prefix spark.executorEnv. are handled natively by Spark. http://spark.apache.org/docs/latest/configuration.html.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this can be handled in a generic way across all cluster managers. Where is the code that handles this in spark-core? Otherwise we should set the environment variables specifically on the executor container in KubernetesClusterSchedulerBackend.

Copy link

@mccheah mccheah Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah never mind - this is confusing because the driver env requires a new Kubernetes-specific configuration but the executors will use an existing configuration from spark-core. But even though the configuration key is defined for spark-core and is consistent across all cluster managers, the implementation is different for each cluster manager - and we do handle this in KubernetesClusterSchedulerBackend.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Driver configuration key is different but executor is the same across cluster manager.

</td>
</tr>
<tr>
<td><code>spark.driverEnv.[EnvironmentVariableName]</code></td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix the configuration with spark.kubernetes instead of just spark.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@liyinan926
Copy link
Member

LGTM.

@erikerlandson
Copy link
Member

LGTM - should the spark.executorEnv.* be added to testing, or is that just guaranteed to work by virtue of being already implemented in spark?

@tangzhankun
Copy link
Author

Yeah, indeed. The logic of configuration value parsing into sparkContext was guaranteed by spark core.

@tangzhankun
Copy link
Author

@liyinan926 @erikerlandson @mccheah Any other concern of merging this basic feature?

@erikerlandson
Copy link
Member

none for me. I'd be OK to merge it for the upcoming release unless others have concerns

@liyinan926
Copy link
Member

No concern from me.

@foxish foxish mentioned this pull request Aug 3, 2017
10 tasks
@erikerlandson
Copy link
Member

@tangzhankun can we rebase this to branch-2.2-kubernetes

@tangzhankun
Copy link
Author

@erikerlandson Pulled a new request based on branch-2.2-kubernetes here: #424

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh just saw that this moved to #424 and merged. I'll file the below comments as an issue so we don't forget.

Thanks for the contribution @tangzhankun !

</td>
</tr>
<tr>
<td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the equivalent of YARN's spark.yarn.appMasterEnv.[EnvironmentVariableName]

</td>
</tr>
<tr>
<td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already doc'd in docs/configuration.md -- please remove this duplicate config from docs/running-on-kubernetes.md and reference it in the spark.kubernetes.driverEnv.[EnvironmentVariableName] section below with language something like For executors, use the generic spark.executorEnv.[EnvironmentVariableName] which works across cluster managers

(ENV_APPLICATION_ID, applicationId()),
(ENV_EXECUTOR_ID, executorId),
(ENV_MOUNTED_CLASSPATH, s"$executorJarsDownloadDir/*"))
(ENV_MOUNTED_CLASSPATH, s"$executorJarsDownloadDir/*")) ++ sc.executorEnvs.toSeq)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if a user specifies an executorEnv that conflicts with one of the above? I think we want to prohibit users from overriding the Spark-internal settings and throw an exception if they attempt to do that

@ash211 ash211 closed this Aug 22, 2017
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants