@@ -17,13 +17,16 @@ cluster, you may setup a test cluster on your local machine using
1717* You must have appropriate permissions to create and list [ pods] ( https://kubernetes.io/docs/user-guide/pods/ ) ,
1818[ ConfigMaps] ( https://kubernetes.io/docs/tasks/configure-pod-container/configmap/ ) and
1919[ secrets] ( https://kubernetes.io/docs/concepts/configuration/secret/ ) in your cluster. You can verify that
20- you can list these resources by running ` kubectl get pods ` ` kubectl get configmap ` , and ` kubectl get secrets ` which
20+ you can list these resources by running ` kubectl get pods ` ` kubectl get configmaps ` , and ` kubectl get secrets ` which
2121should give you a list of pods and configmaps (if any) respectively.
22- * You must have a spark distribution with Kubernetes support. This may be obtained from the
22+ * You must have a spark distribution with Kubernetes support. The following documentation
23+ corresponds to v2.2.0-kubernetes-0.4.0.
24+
25+ This may be obtained from the
2326[ release tarball] ( https://github.com/apache-spark-on-k8s/spark/releases ) or by
2427[ building Spark with Kubernetes support] ( https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/README.md#building-spark-with-kubernetes-support ) .
2528
26- ## Driver & Executor Images
29+ ## Docker Images
2730
2831Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
2932be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
@@ -36,45 +39,57 @@ If you wish to use pre-built docker images, you may use the images published in
3639<tr ><th >Component</th ><th >Image</th ></tr >
3740<tr >
3841 <td >Spark Driver Image</td >
39- <td ><code >kubespark/spark-driver:v2.1 .0-kubernetes-0.3.1 </code ></td >
42+ <td ><code >kubespark/spark-driver:v2.2 .0-kubernetes-0.4.0 </code ></td >
4043</tr >
4144<tr >
4245 <td >Spark Executor Image</td >
43- <td ><code >kubespark/spark-executor:v2.1 .0-kubernetes-0.3.1 </code ></td >
46+ <td ><code >kubespark/spark-executor:v2.2 .0-kubernetes-0.4.0 </code ></td >
4447</tr >
4548<tr >
4649 <td >Spark Initialization Image</td >
47- <td ><code >kubespark/spark-init:v2.1 .0-kubernetes-0.3.1 </code ></td >
50+ <td ><code >kubespark/spark-init:v2.2 .0-kubernetes-0.4.0 </code ></td >
4851</tr >
4952<tr >
5053 <td >Spark Staging Server Image</td >
51- <td ><code >kubespark/spark-resource-staging-server:v2.1 .0-kubernetes-0.3.1 </code ></td >
54+ <td ><code >kubespark/spark-resource-staging-server:v2.2 .0-kubernetes-0.4.0 </code ></td >
5255</tr >
5356<tr >
5457 <td >PySpark Driver Image</td >
55- <td ><code >kubespark/driver-py:v2.1 .0-kubernetes-0.3.1 </code ></td >
58+ <td ><code >kubespark/spark- driver-py:v2.2 .0-kubernetes-0.4.0 </code ></td >
5659</tr >
5760<tr >
5861 <td >PySpark Executor Image</td >
59- <td ><code >kubespark/executor-py:v2.1 .0-kubernetes-0.3.1 </code ></td >
62+ <td ><code >kubespark/spark- executor-py:v2.2 .0-kubernetes-0.4.0 </code ></td >
6063</tr >
6164</table >
6265
63- You may also build these docker images from sources, or customize them as required. Spark distributions include the
64- Docker files for the driver, executor, and init-container at ` dockerfiles/driver/Dockerfile ` ,
65- ` dockerfiles/executor/Dockerfile ` , and ` dockerfiles/init-container/Dockerfile ` respectively. Use these Docker files to
66- build the Docker images, and then tag them with the registry that the images should be sent to. Finally, push the images
67- to the registry.
66+ You may also build these docker images from sources, or customize them as required.
6867
69- For example, if the registry host is ` registry-host ` and the registry is listening on port 5000:
68+ In addition to the above, there are default images supplied for auxiliary components,
69+ like the Resource Staging Server and Spark External Shuffle Service.
7070
71- cd $SPARK_HOME
72- docker build -t registry-host:5000/spark-driver:latest -f dockerfiles/driver/Dockerfile .
73- docker build -t registry-host:5000/spark-executor:latest -f dockerfiles/executor/Dockerfile .
74- docker build -t registry-host:5000/spark-init:latest -f dockerfiles/init-container/Dockerfile .
75- docker push registry-host:5000/spark-driver:latest
76- docker push registry-host:5000/spark-executor:latest
77- docker push registry-host:5000/spark-init:latest
71+ <table class =" table " >
72+ <tr ><th >Component</th ><th >Image</th ></tr >
73+ <tr >
74+ <td >Spark Resource Staging Server</td >
75+ <td ><code >kubespark/spark-resource-staging-server:v2.2.0-kubernetes-0.4.0</code ></td >
76+ </tr >
77+ <tr >
78+ <td >Spark External Shuffle Service</td >
79+ <td ><code >kubespark/spark-shuffle:v2.2.0-kubernetes-0.4.0</code ></td >
80+ </tr >
81+ </table >
82+
83+ There is a script, ` sbin/build-push-docker-images.sh ` that you can use to build and push
84+ customized spark distribution images consisting of all the above components.
85+
86+ Example usage is:
87+
88+ ./sbin/build-push-docker-images.sh -r docker.io/myusername -t my-tag build
89+ ./sbin/build-push-docker-images.sh -r docker.io/myusername -t my-tag push
90+
91+ Docker files are under the ` dockerfiles/ ` and can be customized further before
92+ building using the supplied script, or manually.
7893
7994## Submitting Applications to Kubernetes
8095
@@ -88,10 +103,9 @@ are set up as described above:
88103 --kubernetes-namespace default \
89104 --conf spark.executor.instances=5 \
90105 --conf spark.app.name=spark-pi \
91- --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.3.1 \
92- --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.3.1 \
93- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.1 \
94- local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar
106+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.4.0 \
107+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.4.0 \
108+ local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar
95109
96110The Spark master, specified either via passing the ` --master ` command line argument to ` spark-submit ` or by setting
97111` spark.master ` in the application's configuration, must be a URL with the format ` k8s://<api_server_url> ` . Prefixing the
@@ -128,10 +142,9 @@ Here is how you would execute a Spark-Pi example:
128142 --kubernetes-namespace <k8s-namespace> \
129143 --conf spark.executor.instances=5 \
130144 --conf spark.app.name=spark-pi \
131- --conf spark.kubernetes.driver.docker.image=kubespark/driver-py:v2.1.0-kubernetes-0.3.1 \
132- --conf spark.kubernetes.executor.docker.image=kubespark/executor-py:v2.1.0-kubernetes-0.3.1 \
133- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.1 \
134- --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.3.1-SNAPSHOT.jar \
145+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py:v2.2.0-kubernetes-0.4.0 \
146+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py:v2.2.0-kubernetes-0.4.0 \
147+ --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar \
135148 local:///opt/spark/examples/src/main/python/pi.py 10
136149
137150With Python support it is expected to distribute ` .egg ` , ` .zip ` and ` .py ` libraries to executors via the ` --py-files ` option.
@@ -143,16 +156,15 @@ We support this as well, as seen with the following example:
143156 --kubernetes-namespace <k8s-namespace > \
144157 --conf spark.executor.instances=5 \
145158 --conf spark.app.name=spark-pi \
146- --conf spark.kubernetes.driver.docker.image=kubespark/driver-py: v2 .1.0-kubernetes-0.3.1 \
147- --conf spark.kubernetes.executor.docker.image=kubespark/executor-py: v2 .1.0-kubernetes-0.3.1 \
148- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init: v2 .1.0-kubernetes-0.3.1 \
149- --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.3.1-SNAPSHOT.jar \
159+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver-py: v2 .2.0-kubernetes-0.4.0 \
160+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor-py: v2 .2.0-kubernetes-0.4.0 \
161+ --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar \
150162 --py-files local:///opt/spark/examples/src/main/python/sort.py \
151163 local:///opt/spark/examples/src/main/python/pi.py 10
152164
153165
154166You may also customize your Docker images to use different ` pip ` packages that suit your use-case. As you can see
155- with the current ` driver-py ` Docker image we have commented out the current pip module support that you can uncomment
167+ with the current ` spark- driver-py` Docker image we have commented out the current pip module support that you can uncomment
156168to use:
157169
158170 ...
@@ -205,11 +217,11 @@ and then you can compute the value of Pi as follows:
205217 --kubernetes-namespace default \
206218 --conf spark.executor.instances=5 \
207219 --conf spark.app.name=spark-pi \
208- --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1 .0-kubernetes-0.3.1 \
209- --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1 .0-kubernetes-0.3.1 \
210- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1 .0-kubernetes-0.3.1 \
220+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2 .0-kubernetes-0.4.0 \
221+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2 .0-kubernetes-0.4.0 \
222+ --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2 .0-kubernetes-0.4.0 \
211223 --conf spark.kubernetes.resourceStagingServer.uri=http://<address-of-any-cluster-node>:31000 \
212- examples/jars/spark_examples_2 .11-2.2.0.jar
224+ ./ examples/jars/spark-examples_2 .11-2.2.0-k8s-0.4 .0.jar
213225
214226The Docker image for the resource staging server may also be built from source, in a similar manner to the driver
215227and executor images. The Dockerfile is provided in ` dockerfiles/resource-staging-server/Dockerfile ` .
@@ -225,7 +237,8 @@ Note that this resource staging server is only required for submitting local dep
225237dependencies are all hosted in remote locations like HDFS or http servers, they may be referred to by their appropriate
226238remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images. Those dependencies
227239can be added to the classpath by referencing them with ` local:// ` URIs and/or setting the ` SPARK_EXTRA_CLASSPATH `
228- environment variable in your Dockerfiles.
240+ environment variable in your Dockerfiles. For any remote dependencies (not using the local:// scheme),
241+ the init-container (` spark.kubernetes.initcontainer.docker.image ` ) must be specified during submission.
229242
230243## Accessing Kubernetes Clusters
231244
@@ -246,10 +259,9 @@ If our local proxy were listening on port 8001, we would have our submission loo
246259 --kubernetes-namespace default \
247260 --conf spark.executor.instances=5 \
248261 --conf spark.app.name=spark-pi \
249- --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.3.1 \
250- --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.3.1 \
251- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.1 \
252- local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar
262+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.4.0 \
263+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.4.0 \
264+ local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar
253265
254266Communication between Spark and Kubernetes clusters is performed using the fabric8 kubernetes-client library.
255267The above mechanism using ` kubectl proxy ` can be used when we have authentication providers that the fabric8
@@ -270,7 +282,7 @@ service because there may be multiple shuffle service instances running in a clu
270282a way to target a particular shuffle service.
271283
272284For example, if the shuffle service we want to use is in the default namespace, and
273- has pods with labels ` app=spark-shuffle-service ` and ` spark-version=2.1 .0 ` , we can
285+ has pods with labels ` app=spark-shuffle-service ` and ` spark-version=2.2 .0 ` , we can
274286use those tags to target that particular shuffle service at job launch time. In order to run a job with dynamic allocation enabled,
275287the command may then look like the following:
276288
@@ -285,8 +297,8 @@ the command may then look like the following:
285297 --conf spark.dynamicAllocation.enabled=true \
286298 --conf spark.shuffle.service.enabled=true \
287299 --conf spark.kubernetes.shuffle.namespace=default \
288- --conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1 .0" \
289- local:///opt/spark/examples/jars/spark_examples_2 .11-2.2.0.jar 10 400000 2
300+ --conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.2 .0" \
301+ local:///opt/spark/examples/jars/spark-examples_2 .11-2.2.0-k8s-0.4 .0.jar 10 400000 2
290302
291303## Advanced
292304
@@ -413,13 +425,13 @@ communicate with the resource staging server over TLS. The trustStore can be set
413425 --kubernetes-namespace default \
414426 --conf spark.executor.instances=5 \
415427 --conf spark.app.name=spark-pi \
416- --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1 .0-kubernetes-0.3.1 \
417- --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1 .0-kubernetes-0.3.1 \
418- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1 .0-kubernetes-0.3.1 \
428+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2 .0-kubernetes-0.4.0 \
429+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2 .0-kubernetes-0.4.0 \
430+ --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2 .0-kubernetes-0.4.0 \
419431 --conf spark.kubernetes.resourceStagingServer.uri=https://<address-of-any-cluster-node>:31000 \
420432 --conf spark.ssl.kubernetes.resourceStagingServer.enabled=true \
421433 --conf spark.ssl.kubernetes.resourceStagingServer.clientCertPem=/home/myuser/cert.pem \
422- examples/jars/spark_examples_2 .11-2.2.0.jar
434+ examples/jars/spark-examples_2 .11-2.2.0-k8s-0.4 .0.jar
423435
424436### Spark Properties
425437
@@ -652,37 +664,39 @@ from the other deployment modes. See the [configuration page](configuration.html
652664 </td >
653665</tr >
654666<tr >
655- <td ><code >spark.kubernetes.driver.labels </code ></td >
667+ <td ><code >spark.kubernetes.driver.label.[LabelName] </code ></td >
656668 <td >(none)</td >
657669 <td >
658- Custom labels that will be added to the driver pod. This should be a comma-separated list of label key-value pairs,
659- where each label is in the format <code>key=value</code>. Note that Spark also adds its own labels to the driver pod
670+ Add the label specified by <code>LabelName</code> to the driver pod.
671+ For example, <code>spark.kubernetes.driver.label.something=true</code>.
672+ Note that Spark also adds its own labels to the driver pod
660673 for bookkeeping purposes.
661674 </td >
662675</tr >
663676<tr >
664- <td ><code >spark.kubernetes.driver.annotations </code ></td >
677+ <td ><code >spark.kubernetes.driver.annotation.[AnnotationName] </code ></td >
665678 <td >(none)</td >
666679 <td >
667- Custom annotations that will be added to the driver pod. This should be a comma-separated list of label key-value
668- pairs, where each annotation is in the format <code>key=value </code>.
680+ Add the annotation specified by <code>AnnotationName</code> to the driver pod.
681+ For example, <code>spark.kubernetes.driver.annotation.something=true </code>.
669682 </td >
670683</tr >
671684<tr >
672- <td ><code >spark.kubernetes.executor.labels </code ></td >
685+ <td ><code >spark.kubernetes.executor.label.[LabelName] </code ></td >
673686 <td >(none)</td >
674687 <td >
675- Custom labels that will be added to the executor pods. This should be a comma-separated list of label key-value
676- pairs, where each label is in the format <code>key=value</code>. Note that Spark also adds its own labels to the
677- executor pods for bookkeeping purposes.
688+ Add the label specified by <code>LabelName</code> to the executor pods.
689+ For example, <code>spark.kubernetes.executor.label.something=true</code>.
690+ Note that Spark also adds its own labels to the driver pod
691+ for bookkeeping purposes.
678692 </td >
679693</tr >
680694<tr >
681- <td ><code >spark.kubernetes.executor.annotations </code ></td >
695+ <td ><code >spark.kubernetes.executor.annotation.[AnnotationName] </code ></td >
682696 <td >(none)</td >
683697 <td >
684- Custom annotations that will be added to the executor pods. This should be a comma-separated list of annotation
685- key-value pairs, where each annotation is in the format <code>key=value </code>.
698+ Add the annotation specified by <code>AnnotationName</code> to the executor pods.
699+ For example, <code>spark.kubernetes.executor.annotation.something=true </code>.
686700 </td >
687701</tr >
688702<tr >
0 commit comments