@@ -69,17 +69,17 @@ building using the supplied script, or manually.
6969
7070To launch Spark Pi in cluster mode,
7171
72- {% highlight bash %}
72+ ``` bash
7373$ bin/spark-submit \
7474 --master k8s://https://< k8s-apiserver-host> :< k8s-apiserver-port> \
7575 --deploy-mode cluster \
7676 --name spark-pi \
7777 --class org.apache.spark.examples.SparkPi \
7878 --conf spark.executor.instances=5 \
79- --conf spark.kubernetes.driver.docker .image=<driver-image > \
80- --conf spark.kubernetes.executor.docker .image=<executor-image > \
79+ --conf spark.kubernetes.driver.container .image=< driver-image> \
80+ --conf spark.kubernetes.executor.container .image=< executor-image> \
8181 local:///path/to/examples.jar
82- {% endhighlight %}
82+ ```
8383
8484The Spark master, specified either via passing the ` --master ` command line argument to ` spark-submit ` or by setting
8585` spark.master ` in the application's configuration, must be a URL with the format ` k8s://<api_server_url> ` . Prefixing the
@@ -120,6 +120,54 @@ by their appropriate remote URIs. Also, application dependencies can be pre-moun
120120Those dependencies can be added to the classpath by referencing them with ` local:// ` URIs and/or setting the
121121` SPARK_EXTRA_CLASSPATH ` environment variable in your Dockerfiles.
122122
123+ ### Using Remote Dependencies
124+ When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods
125+ need a Kubernetes [ init-container] ( https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ ) for downloading
126+ the dependencies so the driver and executor containers can use them locally. This requires users to specify the container
127+ image for the init-container using the configuration property ` spark.kubernetes.initContainer.image ` . For example, users
128+ simply add the following option to the ` spark-submit ` command to specify the init-container image:
129+
130+ ```
131+ --conf spark.kubernetes.initContainer.image=<init-container image>
132+ ```
133+
134+ The init-container handles remote dependencies specified in ` spark.jars ` (or the ` --jars ` option of ` spark-submit ` ) and
135+ ` spark.files ` (or the ` --files ` option of ` spark-submit ` ). It also handles remotely hosted main application resources, e.g.,
136+ the main application jar. The following shows an example of using remote dependencies with the ` spark-submit ` command:
137+
138+ ``` bash
139+ $ bin/spark-submit \
140+ --master k8s://https://< k8s-apiserver-host> :< k8s-apiserver-port> \
141+ --deploy-mode cluster \
142+ --name spark-pi \
143+ --class org.apache.spark.examples.SparkPi \
144+ --jars https://path/to/dependency1.jar,https://path/to/dependency2.jar
145+ --files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2
146+ --conf spark.executor.instances=5 \
147+ --conf spark.kubernetes.driver.container.image=< driver-image> \
148+ --conf spark.kubernetes.executor.container.image=< executor-image> \
149+ --conf spark.kubernetes.initContainer.image=< init-container image>
150+ https://path/to/examples.jar
151+ ```
152+
153+ ## Secret Management
154+ Kubernetes [ Secrets] ( https://kubernetes.io/docs/concepts/configuration/secret/ ) can be used to provide credentials for a
155+ Spark application to access secured services. To mount a user-specified secret into the driver container, users can use
156+ the configuration property of the form ` spark.kubernetes.driver.secrets.[SecretName]=<mount path> ` . Similarly, the
157+ configuration property of the form ` spark.kubernetes.executor.secrets.[SecretName]=<mount path> ` can be used to mount a
158+ user-specified secret into the executor containers. Note that it is assumed that the secret to be mounted is in the same
159+ namespace as that of the driver and executor pods. For example, to mount a secret named ` spark-secret ` onto the path
160+ ` /etc/secrets ` in both the driver and executor containers, add the following options to the ` spark-submit ` command:
161+
162+ ```
163+ --conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets
164+ --conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets
165+ ```
166+
167+ Note that if an init-container is used, any secret mounted into the driver container will also be mounted into the
168+ init-container of the driver. Similarly, any secret mounted into an executor container will also be mounted into the
169+ init-container of the executor.
170+
123171## Introspection and Debugging
124172
125173These are the different ways in which you can investigate a running/completed Spark application, monitor progress, and
@@ -275,7 +323,7 @@ specific to Spark on Kubernetes.
275323 <td ><code >(none)</code ></td >
276324 <td >
277325 Container image to use for the driver.
278- This is usually of the form ` example.com/repo/spark-driver:v1.0.0` .
326+ This is usually of the form <code> example.com/repo/spark-driver:v1.0.0</code> .
279327 This configuration is required and must be provided by the user.
280328 </td >
281329</tr >
@@ -284,7 +332,7 @@ specific to Spark on Kubernetes.
284332 <td ><code >(none)</code ></td >
285333 <td >
286334 Container image to use for the executors.
287- This is usually of the form ` example.com/repo/spark-executor:v1.0.0` .
335+ This is usually of the form <code> example.com/repo/spark-executor:v1.0.0</code> .
288336 This configuration is required and must be provided by the user.
289337 </td >
290338</tr >
@@ -528,51 +576,91 @@ specific to Spark on Kubernetes.
528576 </td >
529577</tr >
530578<tr >
531- <td ><code >spark.kubernetes.driver.limit.cores</code ></td >
532- <td >(none)</td >
533- <td >
534- Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod.
535- </td >
536- </tr >
537- <tr >
538- <td ><code >spark.kubernetes.executor.limit.cores</code ></td >
539- <td >(none)</td >
540- <td >
541- Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application.
542- </td >
543- </tr >
544- <tr >
545- <td ><code >spark.kubernetes.node.selector.[labelKey]</code ></td >
546- <td >(none)</td >
547- <td >
548- Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the
549- configuration's value. For example, setting <code>spark.kubernetes.node.selector.identifier</code> to <code>myIdentifier</code>
550- will result in the driver pod and executors having a node selector with key <code>identifier</code> and value
551- <code>myIdentifier</code>. Multiple node selector keys can be added by setting multiple configurations with this prefix.
552- </td>
553- </tr >
554- <tr >
555- <td ><code >spark.kubernetes.driverEnv.[EnvironmentVariableName]</code ></td >
556- <td >(none)</td >
557- <td >
558- Add the environment variable specified by <code>EnvironmentVariableName</code> to
559- the Driver process. The user can specify multiple of these to set multiple environment variables.
560- </td >
561- </tr >
562- <tr >
563- <td><code>spark.kubernetes.mountDependencies.jarsDownloadDir</code></td>
564- <td><code>/var/spark-data/spark-jars</code></td>
565- <td>
566- Location to download jars to in the driver and executors.
567- This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods.
568- </td>
569- </tr >
570- <tr >
571- <td><code>spark.kubernetes.mountDependencies.filesDownloadDir</code></td>
572- <td><code>/var/spark-data/spark-files</code></td>
573- <td>
574- Location to download jars to in the driver and executors.
575- This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods.
576- </td>
577- </tr >
579+ <td ><code >spark.kubernetes.driver.limit.cores</code ></td >
580+ <td >(none)</td >
581+ <td >
582+ Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod.
583+ </td >
584+ </tr >
585+ <tr >
586+ <td ><code >spark.kubernetes.executor.limit.cores</code ></td >
587+ <td >(none)</td >
588+ <td >
589+ Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application.
590+ </td >
591+ </tr >
592+ <tr >
593+ <td ><code >spark.kubernetes.node.selector.[labelKey]</code ></td >
594+ <td >(none)</td >
595+ <td >
596+ Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the
597+ configuration's value. For example, setting <code>spark.kubernetes.node.selector.identifier</code> to <code>myIdentifier</code>
598+ will result in the driver pod and executors having a node selector with key <code>identifier</code> and value
599+ <code>myIdentifier</code>. Multiple node selector keys can be added by setting multiple configurations with this prefix.
600+ </td >
601+ </tr >
602+ <tr >
603+ <td ><code >spark.kubernetes.driverEnv.[EnvironmentVariableName]</code ></td >
604+ <td >(none)</td >
605+ <td >
606+ Add the environment variable specified by <code>EnvironmentVariableName</code> to
607+ the Driver process. The user can specify multiple of these to set multiple environment variables.
608+ </td >
609+ </tr >
610+ <tr >
611+ <td ><code >spark.kubernetes.mountDependencies.jarsDownloadDir</code ></td >
612+ <td ><code >/var/spark-data/spark-jars</code ></td >
613+ <td >
614+ Location to download jars to in the driver and executors.
615+ This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods.
616+ </td >
617+ </tr >
618+ <tr >
619+ <td ><code >spark.kubernetes.mountDependencies.filesDownloadDir</code ></td >
620+ <td ><code >/var/spark-data/spark-files</code ></td >
621+ <td >
622+ Location to download jars to in the driver and executors.
623+ This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods.
624+ </td >
625+ </tr >
626+ <tr >
627+ <td ><code >spark.kubernetes.mountDependencies.timeout</code ></td >
628+ <td >300s</td >
629+ <td >
630+ Timeout in seconds before aborting the attempt to download and unpack dependencies from remote locations into
631+ the driver and executor pods.
632+ </td >
633+ </tr >
634+ <tr >
635+ <td ><code >spark.kubernetes.mountDependencies.maxSimultaneousDownloads</code ></td >
636+ <td >5</td >
637+ <td >
638+ Maximum number of remote dependencies to download simultaneously in a driver or executor pod.
639+ </td >
640+ </tr >
641+ <tr >
642+ <td ><code >spark.kubernetes.initContainer.image</code ></td >
643+ <td >(none)</td >
644+ <td >
645+ Container image for the <a href =" https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ " >init-container</a > of the driver and executors for downloading dependencies. This is usually of the form <code >example.com/repo/spark-init: v1 .0.0</code >. This configuration is optional and must be provided by the user if any non-container local dependency is used and must be downloaded remotely.
646+ </td >
647+ </tr >
648+ <tr >
649+ <td ><code >spark.kubernetes.driver.secrets.[SecretName]</code ></td >
650+ <td >(none)</td >
651+ <td >
652+ Add the <a href =" https://kubernetes.io/docs/concepts/configuration/secret/ " >Kubernetes Secret</a > named <code >SecretName</code > to the driver pod on the path specified in the value. For example,
653+ <code >spark.kubernetes.driver.secrets.spark-secret=/etc/secrets</code >. Note that if an init-container is used,
654+ the secret will also be added to the init-container in the driver pod.
655+ </td >
656+ </tr >
657+ <tr >
658+ <td ><code >spark.kubernetes.executor.secrets.[SecretName]</code ></td >
659+ <td >(none)</td >
660+ <td >
661+ Add the <a href =" https://kubernetes.io/docs/concepts/configuration/secret/ " >Kubernetes Secret</a > named <code >SecretName</code > to the executor pod on the path specified in the value. For example,
662+ <code >spark.kubernetes.executor.secrets.spark-secret=/etc/secrets</code >. Note that if an init-container is used,
663+ the secret will also be added to the init-container in the executor pod.
664+ </td >
665+ </tr >
578666</table >
0 commit comments