@@ -17,13 +17,16 @@ cluster, you may setup a test cluster on your local machine using
1717* You must have appropriate permissions to create and list [ pods] ( https://kubernetes.io/docs/user-guide/pods/ ) ,
1818[ ConfigMaps] ( https://kubernetes.io/docs/tasks/configure-pod-container/configmap/ ) and
1919[ secrets] ( https://kubernetes.io/docs/concepts/configuration/secret/ ) in your cluster. You can verify that
20- you can list these resources by running ` kubectl get pods ` ` kubectl get configmap ` , and ` kubectl get secrets ` which
20+ you can list these resources by running ` kubectl get pods ` ` kubectl get configmaps ` , and ` kubectl get secrets ` which
2121should give you a list of pods and configmaps (if any) respectively.
22- * You must have a spark distribution with Kubernetes support. This may be obtained from the
22+ * You must have a spark distribution with Kubernetes support. The following documentation
23+ corresponds to v2.2.0-kubernetes-0.4.0.
24+
25+ This may be obtained from the
2326[ release tarball] ( https://github.com/apache-spark-on-k8s/spark/releases ) or by
2427[ building Spark with Kubernetes support] ( https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/README.md#building-spark-with-kubernetes-support ) .
2528
26- ## Driver & Executor Images
29+ ## Docker Images
2730
2831Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
2932be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
@@ -36,45 +39,57 @@ If you wish to use pre-built docker images, you may use the images published in
3639<tr ><th >Component</th ><th >Image</th ></tr >
3740<tr >
3841 <td >Spark Driver Image</td >
39- <td ><code >kubespark/spark-driver:v2.1 .0-kubernetes-0.3.1 </code ></td >
42+ <td ><code >kubespark/spark-driver:v2.2 .0-kubernetes-0.4.0 </code ></td >
4043</tr >
4144<tr >
4245 <td >Spark Executor Image</td >
43- <td ><code >kubespark/spark-executor:v2.1 .0-kubernetes-0.3.1 </code ></td >
46+ <td ><code >kubespark/spark-executor:v2.2 .0-kubernetes-0.4.0 </code ></td >
4447</tr >
4548<tr >
4649 <td >Spark Initialization Image</td >
47- <td ><code >kubespark/spark-init:v2.1 .0-kubernetes-0.3.1 </code ></td >
50+ <td ><code >kubespark/spark-init:v2.2 .0-kubernetes-0.4.0 </code ></td >
4851</tr >
4952<tr >
5053 <td >Spark Staging Server Image</td >
51- <td ><code >kubespark/spark-resource-staging-server:v2.1 .0-kubernetes-0.3.1 </code ></td >
54+ <td ><code >kubespark/spark-resource-staging-server:v2.2 .0-kubernetes-0.4.0 </code ></td >
5255</tr >
5356<tr >
5457 <td >PySpark Driver Image</td >
55- <td ><code >kubespark/driver-py:v2.1 .0-kubernetes-0.3.1 </code ></td >
58+ <td ><code >kubespark/driver-py:v2.2 .0-kubernetes-0.4.0 </code ></td >
5659</tr >
5760<tr >
5861 <td >PySpark Executor Image</td >
59- <td ><code >kubespark/executor-py:v2.1 .0-kubernetes-0.3.1 </code ></td >
62+ <td ><code >kubespark/executor-py:v2.2 .0-kubernetes-0.4.0 </code ></td >
6063</tr >
6164</table >
6265
63- You may also build these docker images from sources, or customize them as required. Spark distributions include the
64- Docker files for the driver, executor, and init-container at ` dockerfiles/driver/Dockerfile ` ,
65- ` dockerfiles/executor/Dockerfile ` , and ` dockerfiles/init-container/Dockerfile ` respectively. Use these Docker files to
66- build the Docker images, and then tag them with the registry that the images should be sent to. Finally, push the images
67- to the registry.
66+ You may also build these docker images from sources, or customize them as required.
6867
69- For example, if the registry host is ` registry-host ` and the registry is listening on port 5000:
68+ In addition to the above, there are default images supplied for auxiliary components,
69+ like the Resource Staging Server and Spark External Shuffle Service.
7070
71- cd $SPARK_HOME
72- docker build -t registry-host:5000/spark-driver:latest -f dockerfiles/driver/Dockerfile .
73- docker build -t registry-host:5000/spark-executor:latest -f dockerfiles/executor/Dockerfile .
74- docker build -t registry-host:5000/spark-init:latest -f dockerfiles/init-container/Dockerfile .
75- docker push registry-host:5000/spark-driver:latest
76- docker push registry-host:5000/spark-executor:latest
77- docker push registry-host:5000/spark-init:latest
71+ <table class =" table " >
72+ <tr ><th >Component</th ><th >Image</th ></tr >
73+ <tr >
74+ <td >Spark Resource Staging Server</td >
75+ <td ><code >kubespark/spark-resource-staging-server:v2.2.0-kubernetes-0.4.0</code ></td >
76+ </tr >
77+ <tr >
78+ <td >Spark External Shuffle Service</td >
79+ <td ><code >kubespark/spark-shuffle:v2.2.0-kubernetes-0.4.0</code ></td >
80+ </tr >
81+ </table >
82+
83+ There is a script, ` sbin/build-push-docker-images.sh ` that you can use to build and push
84+ customized spark distribution images consisting of all the above components.
85+
86+ Example usage is:
87+
88+ ./sbin/build-push-docker-images.sh -r docker.io/myusername -t my-tag build
89+ ./sbin/build-push-docker-images.sh -r docker.io/myusername -t my-tag push
90+
91+ Docker files are under the ` dockerfiles/ ` and can be customized further before
92+ building using the supplied script, or manually.
7893
7994## Submitting Applications to Kubernetes
8095
@@ -88,10 +103,9 @@ are set up as described above:
88103 --kubernetes-namespace default \
89104 --conf spark.executor.instances=5 \
90105 --conf spark.app.name=spark-pi \
91- --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.3.1 \
92- --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.3.1 \
93- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.1 \
94- local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar
106+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.4.0 \
107+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.4.0 \
108+ local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar
95109
96110The Spark master, specified either via passing the ` --master ` command line argument to ` spark-submit ` or by setting
97111` spark.master ` in the application's configuration, must be a URL with the format ` k8s://<api_server_url> ` . Prefixing the
@@ -128,10 +142,9 @@ Here is how you would execute a Spark-Pi example:
128142 --kubernetes-namespace <k8s-namespace> \
129143 --conf spark.executor.instances=5 \
130144 --conf spark.app.name=spark-pi \
131- --conf spark.kubernetes.driver.docker.image=kubespark/driver-py:v2.1.0-kubernetes-0.3.1 \
132- --conf spark.kubernetes.executor.docker.image=kubespark/executor-py:v2.1.0-kubernetes-0.3.1 \
133- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.1 \
134- --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.3.1-SNAPSHOT.jar \
145+ --conf spark.kubernetes.driver.docker.image=kubespark/driver-py:v2.2.0-kubernetes-0.4.0 \
146+ --conf spark.kubernetes.executor.docker.image=kubespark/executor-py:v2.2.0-kubernetes-0.4.0 \
147+ --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar \
135148 local:///opt/spark/examples/src/main/python/pi.py 10
136149
137150With Python support it is expected to distribute ` .egg ` , ` .zip ` and ` .py ` libraries to executors via the ` --py-files ` option.
@@ -143,10 +156,9 @@ We support this as well, as seen with the following example:
143156 --kubernetes-namespace <k8s-namespace > \
144157 --conf spark.executor.instances=5 \
145158 --conf spark.app.name=spark-pi \
146- --conf spark.kubernetes.driver.docker.image=kubespark/driver-py: v2 .1.0-kubernetes-0.3.1 \
147- --conf spark.kubernetes.executor.docker.image=kubespark/executor-py: v2 .1.0-kubernetes-0.3.1 \
148- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init: v2 .1.0-kubernetes-0.3.1 \
149- --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.3.1-SNAPSHOT.jar \
159+ --conf spark.kubernetes.driver.docker.image=kubespark/driver-py: v2 .2.0-kubernetes-0.4.0 \
160+ --conf spark.kubernetes.executor.docker.image=kubespark/executor-py: v2 .2.0-kubernetes-0.4.0 \
161+ --jars local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar \
150162 --py-files local:///opt/spark/examples/src/main/python/sort.py \
151163 local:///opt/spark/examples/src/main/python/pi.py 10
152164
@@ -205,11 +217,11 @@ and then you can compute the value of Pi as follows:
205217 --kubernetes-namespace default \
206218 --conf spark.executor.instances=5 \
207219 --conf spark.app.name=spark-pi \
208- --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1 .0-kubernetes-0.3.1 \
209- --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1 .0-kubernetes-0.3.1 \
210- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1 .0-kubernetes-0.3.1 \
220+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2 .0-kubernetes-0.4.0 \
221+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2 .0-kubernetes-0.4.0 \
222+ --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2 .0-kubernetes-0.4.0 \
211223 --conf spark.kubernetes.resourceStagingServer.uri=http://<address-of-any-cluster-node>:31000 \
212- examples/jars/spark_examples_2 .11-2.2.0.jar
224+ ./ examples/jars/spark-examples_2 .11-2.2.0-k8s-0.4 .0.jar
213225
214226The Docker image for the resource staging server may also be built from source, in a similar manner to the driver
215227and executor images. The Dockerfile is provided in ` dockerfiles/resource-staging-server/Dockerfile ` .
@@ -225,7 +237,9 @@ Note that this resource staging server is only required for submitting local dep
225237dependencies are all hosted in remote locations like HDFS or http servers, they may be referred to by their appropriate
226238remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images. Those dependencies
227239can be added to the classpath by referencing them with ` local:// ` URIs and/or setting the ` SPARK_EXTRA_CLASSPATH `
228- environment variable in your Dockerfiles.
240+ environment variable in your Dockerfiles. For any remote dependencies that aren't baked into the driver and executor
241+ docker images, whether they are supplied via http, or hdfs, or the resource staging server, the
242+ init-container (` spark.kubernetes.initcontainer.docker.image ` ) must be specified during submission.
229243
230244### Accessing Kubernetes Clusters
231245
@@ -246,10 +260,9 @@ If our local proxy were listening on port 8001, we would have our submission loo
246260 --kubernetes-namespace default \
247261 --conf spark.executor.instances=5 \
248262 --conf spark.app.name=spark-pi \
249- --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.3.1 \
250- --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.3.1 \
251- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.1 \
252- local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar
263+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2.0-kubernetes-0.4.0 \
264+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2.0-kubernetes-0.4.0 \
265+ local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.4.0.jar
253266
254267Communication between Spark and Kubernetes clusters is performed using the fabric8 kubernetes-client library.
255268The above mechanism using ` kubectl proxy ` can be used when we have authentication providers that the fabric8
@@ -270,7 +283,7 @@ service because there may be multiple shuffle service instances running in a clu
270283a way to target a particular shuffle service.
271284
272285For example, if the shuffle service we want to use is in the default namespace, and
273- has pods with labels ` app=spark-shuffle-service ` and ` spark-version=2.1 .0 ` , we can
286+ has pods with labels ` app=spark-shuffle-service ` and ` spark-version=2.2 .0 ` , we can
274287use those tags to target that particular shuffle service at job launch time. In order to run a job with dynamic allocation enabled,
275288the command may then look like the following:
276289
@@ -285,8 +298,8 @@ the command may then look like the following:
285298 --conf spark.dynamicAllocation.enabled=true \
286299 --conf spark.shuffle.service.enabled=true \
287300 --conf spark.kubernetes.shuffle.namespace=default \
288- --conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1 .0" \
289- local:///opt/spark/examples/jars/spark_examples_2 .11-2.2.0.jar 10 400000 2
301+ --conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.2 .0" \
302+ local:///opt/spark/examples/jars/spark-examples_2 .11-2.2.0-k8s-0.4 .0.jar 10 400000 2
290303
291304## Advanced
292305
@@ -362,9 +375,9 @@ communicate with the resource staging server over TLS. The trustStore can be set
362375 --kubernetes-namespace default \
363376 --conf spark.executor.instances=5 \
364377 --conf spark.app.name=spark-pi \
365- --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1 .0-kubernetes-0.3.1 \
366- --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1 .0-kubernetes-0.3.1 \
367- --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1 .0-kubernetes-0.3.1 \
378+ --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.2 .0-kubernetes-0.4.0 \
379+ --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.2 .0-kubernetes-0.4.0 \
380+ --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.2 .0-kubernetes-0.4.0 \
368381 --conf spark.kubernetes.resourceStagingServer.uri=https://<address-of-any-cluster-node>:31000 \
369382 --conf spark.ssl.kubernetes.resourceStagingServer.enabled=true \
370383 --conf spark.ssl.kubernetes.resourceStagingServer.clientCertPem=/home/myuser/cert.pem \
0 commit comments