Skip to content

Commit ded6d27

Browse files
liyinan926ueshin
authored andcommitted
[SPARK-22648][K8S] Add documentation covering init containers and secrets
## What changes were proposed in this pull request? This PR updates the Kubernetes documentation corresponding to the following features/changes in apache#19954. * Ability to use remote dependencies through the init-container. * Ability to mount user-specified secrets into the driver and executor pods. vanzin jiangxb1987 foxish Author: Yinan Li <[email protected]> Closes apache#20059 from liyinan926/doc-update.
1 parent 171f6dd commit ded6d27

File tree

2 files changed

+143
-54
lines changed

2 files changed

+143
-54
lines changed

docs/running-on-kubernetes.md

Lines changed: 141 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -69,17 +69,17 @@ building using the supplied script, or manually.
6969

7070
To launch Spark Pi in cluster mode,
7171

72-
{% highlight bash %}
72+
```bash
7373
$ bin/spark-submit \
7474
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
7575
--deploy-mode cluster \
7676
--name spark-pi \
7777
--class org.apache.spark.examples.SparkPi \
7878
--conf spark.executor.instances=5 \
79-
--conf spark.kubernetes.driver.docker.image=<driver-image> \
80-
--conf spark.kubernetes.executor.docker.image=<executor-image> \
79+
--conf spark.kubernetes.driver.container.image=<driver-image> \
80+
--conf spark.kubernetes.executor.container.image=<executor-image> \
8181
local:///path/to/examples.jar
82-
{% endhighlight %}
82+
```
8383

8484
The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting
8585
`spark.master` in the application's configuration, must be a URL with the format `k8s://<api_server_url>`. Prefixing the
@@ -120,6 +120,54 @@ by their appropriate remote URIs. Also, application dependencies can be pre-moun
120120
Those dependencies can be added to the classpath by referencing them with `local://` URIs and/or setting the
121121
`SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles.
122122

123+
### Using Remote Dependencies
124+
When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods
125+
need a Kubernetes [init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) for downloading
126+
the dependencies so the driver and executor containers can use them locally. This requires users to specify the container
127+
image for the init-container using the configuration property `spark.kubernetes.initContainer.image`. For example, users
128+
simply add the following option to the `spark-submit` command to specify the init-container image:
129+
130+
```
131+
--conf spark.kubernetes.initContainer.image=<init-container image>
132+
```
133+
134+
The init-container handles remote dependencies specified in `spark.jars` (or the `--jars` option of `spark-submit`) and
135+
`spark.files` (or the `--files` option of `spark-submit`). It also handles remotely hosted main application resources, e.g.,
136+
the main application jar. The following shows an example of using remote dependencies with the `spark-submit` command:
137+
138+
```bash
139+
$ bin/spark-submit \
140+
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
141+
--deploy-mode cluster \
142+
--name spark-pi \
143+
--class org.apache.spark.examples.SparkPi \
144+
--jars https://path/to/dependency1.jar,https://path/to/dependency2.jar
145+
--files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2
146+
--conf spark.executor.instances=5 \
147+
--conf spark.kubernetes.driver.container.image=<driver-image> \
148+
--conf spark.kubernetes.executor.container.image=<executor-image> \
149+
--conf spark.kubernetes.initContainer.image=<init-container image>
150+
https://path/to/examples.jar
151+
```
152+
153+
## Secret Management
154+
Kubernetes [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) can be used to provide credentials for a
155+
Spark application to access secured services. To mount a user-specified secret into the driver container, users can use
156+
the configuration property of the form `spark.kubernetes.driver.secrets.[SecretName]=<mount path>`. Similarly, the
157+
configuration property of the form `spark.kubernetes.executor.secrets.[SecretName]=<mount path>` can be used to mount a
158+
user-specified secret into the executor containers. Note that it is assumed that the secret to be mounted is in the same
159+
namespace as that of the driver and executor pods. For example, to mount a secret named `spark-secret` onto the path
160+
`/etc/secrets` in both the driver and executor containers, add the following options to the `spark-submit` command:
161+
162+
```
163+
--conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets
164+
--conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets
165+
```
166+
167+
Note that if an init-container is used, any secret mounted into the driver container will also be mounted into the
168+
init-container of the driver. Similarly, any secret mounted into an executor container will also be mounted into the
169+
init-container of the executor.
170+
123171
## Introspection and Debugging
124172

125173
These are the different ways in which you can investigate a running/completed Spark application, monitor progress, and
@@ -275,7 +323,7 @@ specific to Spark on Kubernetes.
275323
<td><code>(none)</code></td>
276324
<td>
277325
Container image to use for the driver.
278-
This is usually of the form `example.com/repo/spark-driver:v1.0.0`.
326+
This is usually of the form <code>example.com/repo/spark-driver:v1.0.0</code>.
279327
This configuration is required and must be provided by the user.
280328
</td>
281329
</tr>
@@ -284,7 +332,7 @@ specific to Spark on Kubernetes.
284332
<td><code>(none)</code></td>
285333
<td>
286334
Container image to use for the executors.
287-
This is usually of the form `example.com/repo/spark-executor:v1.0.0`.
335+
This is usually of the form <code>example.com/repo/spark-executor:v1.0.0</code>.
288336
This configuration is required and must be provided by the user.
289337
</td>
290338
</tr>
@@ -528,51 +576,91 @@ specific to Spark on Kubernetes.
528576
</td>
529577
</tr>
530578
<tr>
531-
<td><code>spark.kubernetes.driver.limit.cores</code></td>
532-
<td>(none)</td>
533-
<td>
534-
Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod.
535-
</td>
536-
</tr>
537-
<tr>
538-
<td><code>spark.kubernetes.executor.limit.cores</code></td>
539-
<td>(none)</td>
540-
<td>
541-
Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application.
542-
</td>
543-
</tr>
544-
<tr>
545-
<td><code>spark.kubernetes.node.selector.[labelKey]</code></td>
546-
<td>(none)</td>
547-
<td>
548-
Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the
549-
configuration's value. For example, setting <code>spark.kubernetes.node.selector.identifier</code> to <code>myIdentifier</code>
550-
will result in the driver pod and executors having a node selector with key <code>identifier</code> and value
551-
<code>myIdentifier</code>. Multiple node selector keys can be added by setting multiple configurations with this prefix.
552-
</td>
553-
</tr>
554-
<tr>
555-
<td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
556-
<td>(none)</td>
557-
<td>
558-
Add the environment variable specified by <code>EnvironmentVariableName</code> to
559-
the Driver process. The user can specify multiple of these to set multiple environment variables.
560-
</td>
561-
</tr>
562-
<tr>
563-
<td><code>spark.kubernetes.mountDependencies.jarsDownloadDir</code></td>
564-
<td><code>/var/spark-data/spark-jars</code></td>
565-
<td>
566-
Location to download jars to in the driver and executors.
567-
This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods.
568-
</td>
569-
</tr>
570-
<tr>
571-
<td><code>spark.kubernetes.mountDependencies.filesDownloadDir</code></td>
572-
<td><code>/var/spark-data/spark-files</code></td>
573-
<td>
574-
Location to download jars to in the driver and executors.
575-
This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods.
576-
</td>
577-
</tr>
579+
<td><code>spark.kubernetes.driver.limit.cores</code></td>
580+
<td>(none)</td>
581+
<td>
582+
Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod.
583+
</td>
584+
</tr>
585+
<tr>
586+
<td><code>spark.kubernetes.executor.limit.cores</code></td>
587+
<td>(none)</td>
588+
<td>
589+
Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application.
590+
</td>
591+
</tr>
592+
<tr>
593+
<td><code>spark.kubernetes.node.selector.[labelKey]</code></td>
594+
<td>(none)</td>
595+
<td>
596+
Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the
597+
configuration's value. For example, setting <code>spark.kubernetes.node.selector.identifier</code> to <code>myIdentifier</code>
598+
will result in the driver pod and executors having a node selector with key <code>identifier</code> and value
599+
<code>myIdentifier</code>. Multiple node selector keys can be added by setting multiple configurations with this prefix.
600+
</td>
601+
</tr>
602+
<tr>
603+
<td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
604+
<td>(none)</td>
605+
<td>
606+
Add the environment variable specified by <code>EnvironmentVariableName</code> to
607+
the Driver process. The user can specify multiple of these to set multiple environment variables.
608+
</td>
609+
</tr>
610+
<tr>
611+
<td><code>spark.kubernetes.mountDependencies.jarsDownloadDir</code></td>
612+
<td><code>/var/spark-data/spark-jars</code></td>
613+
<td>
614+
Location to download jars to in the driver and executors.
615+
This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods.
616+
</td>
617+
</tr>
618+
<tr>
619+
<td><code>spark.kubernetes.mountDependencies.filesDownloadDir</code></td>
620+
<td><code>/var/spark-data/spark-files</code></td>
621+
<td>
622+
Location to download jars to in the driver and executors.
623+
This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods.
624+
</td>
625+
</tr>
626+
<tr>
627+
<td><code>spark.kubernetes.mountDependencies.timeout</code></td>
628+
<td>300s</td>
629+
<td>
630+
Timeout in seconds before aborting the attempt to download and unpack dependencies from remote locations into
631+
the driver and executor pods.
632+
</td>
633+
</tr>
634+
<tr>
635+
<td><code>spark.kubernetes.mountDependencies.maxSimultaneousDownloads</code></td>
636+
<td>5</td>
637+
<td>
638+
Maximum number of remote dependencies to download simultaneously in a driver or executor pod.
639+
</td>
640+
</tr>
641+
<tr>
642+
<td><code>spark.kubernetes.initContainer.image</code></td>
643+
<td>(none)</td>
644+
<td>
645+
Container image for the <a href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/">init-container</a> of the driver and executors for downloading dependencies. This is usually of the form <code>example.com/repo/spark-init:v1.0.0</code>. This configuration is optional and must be provided by the user if any non-container local dependency is used and must be downloaded remotely.
646+
</td>
647+
</tr>
648+
<tr>
649+
<td><code>spark.kubernetes.driver.secrets.[SecretName]</code></td>
650+
<td>(none)</td>
651+
<td>
652+
Add the <a href="https://kubernetes.io/docs/concepts/configuration/secret/">Kubernetes Secret</a> named <code>SecretName</code> to the driver pod on the path specified in the value. For example,
653+
<code>spark.kubernetes.driver.secrets.spark-secret=/etc/secrets</code>. Note that if an init-container is used,
654+
the secret will also be added to the init-container in the driver pod.
655+
</td>
656+
</tr>
657+
<tr>
658+
<td><code>spark.kubernetes.executor.secrets.[SecretName]</code></td>
659+
<td>(none)</td>
660+
<td>
661+
Add the <a href="https://kubernetes.io/docs/concepts/configuration/secret/">Kubernetes Secret</a> named <code>SecretName</code> to the executor pod on the path specified in the value. For example,
662+
<code>spark.kubernetes.executor.secrets.spark-secret=/etc/secrets</code>. Note that if an init-container is used,
663+
the secret will also be added to the init-container in the executor pod.
664+
</td>
665+
</tr>
578666
</table>

sbin/build-push-docker-images.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@
2020
# with Kubernetes support.
2121

2222
declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \
23-
[spark-executor]=kubernetes/dockerfiles/executor/Dockerfile )
23+
[spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \
24+
[spark-init]=kubernetes/dockerfiles/init-container/Dockerfile )
2425

2526
function build {
2627
docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .

0 commit comments

Comments
 (0)