diff --git a/src/jekyll/running-on-kubernetes.md b/src/jekyll/running-on-kubernetes.md index d71b9b2..36b4552 100644 --- a/src/jekyll/running-on-kubernetes.md +++ b/src/jekyll/running-on-kubernetes.md @@ -1,7 +1,6 @@ --- layout: global title: Running Spark on Kubernetes -toc: true --- Support for running on [Kubernetes](https://kubernetes.io/docs/whatisk8s/) is available in experimental status. The @@ -24,14 +23,6 @@ should give you a list of pods and configmaps (if any) respectively. [release tarball](https://github.com/apache-spark-on-k8s/spark/releases) or by [building Spark with Kubernetes support](../resource-managers/kubernetes/README.md#building-spark-with-kubernetes-support). -## Current Limitations - -Running Spark on Kubernetes is currently an experimental feature. Some restrictions on the current implementation that -should be lifted in the future include: -* Applications can only run in cluster mode. -* Only Scala and Java applications can be run. - - ## Driver & Executor Images Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to @@ -45,15 +36,15 @@ If you wish to use pre-built docker images, you may use the images published in ComponentImage Spark Driver Image - kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-alpha.2 + kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 Spark Executor Image - kubespark/spark-executor:v2.1.0-kubernetes-0.1.0-alpha.2 + kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 Spark Initialization Image - kubespark/spark-init:v2.1.0-kubernetes-0.1.0-alpha.2 + kubespark/spark-init:v2.1.0-kubernetes-0.2.0 @@ -85,9 +76,9 @@ are set up as described above: --kubernetes-namespace default \ --conf spark.executor.instances=5 \ --conf spark.app.name=spark-pi \ - --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-alpha.2 \ - --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.1.0-alpha.2 \ - --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.1.0-alpha.2 \ + --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 \ + --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 \ + --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 \ local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting @@ -134,9 +125,9 @@ and then you can compute the value of Pi as follows: --kubernetes-namespace default \ --conf spark.executor.instances=5 \ --conf spark.app.name=spark-pi \ - --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-alpha.2 \ - --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.1.0-alpha.2 \ - --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.1.0-alpha.2 \ + --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 \ + --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 \ + --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 \ --conf spark.kubernetes.resourceStagingServer.uri=http://:31000 \ examples/jars/spark_examples_2.11-2.2.0.jar @@ -177,9 +168,9 @@ If our local proxy were listening on port 8001, we would have our submission loo --kubernetes-namespace default \ --conf spark.executor.instances=5 \ --conf spark.app.name=spark-pi \ - --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-alpha.2 \ - --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.1.0-alpha.2 \ - --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.1.0-alpha.2 \ + --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 \ + --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 \ + --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 \ local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar Communication between Spark and Kubernetes clusters is performed using the fabric8 kubernetes-client library. @@ -293,9 +284,9 @@ communicate with the resource staging server over TLS. The trustStore can be set --kubernetes-namespace default \ --conf spark.executor.instances=5 \ --conf spark.app.name=spark-pi \ - --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.1.0-alpha.2 \ - --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.1.0-alpha.2 \ - --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.1.0-alpha.2 \ + --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 \ + --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 \ + --conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 \ --conf spark.kubernetes.resourceStagingServer.uri=https://:31000 \ --conf spark.ssl.kubernetes.resourceStagingServer.enabled=true \ --conf spark.ssl.kubernetes.resourceStagingServer.clientCertPem=/home/myuser/cert.pem \ @@ -459,6 +450,69 @@ from the other deployment modes. See the [configuration page](configuration.html client cert file, and/or OAuth token. + + spark.kubernetes.authenticate.resourceStagingServer.caCertFile + (none) + + Path to the CA cert file for connecting to the Kubernetes API server over TLS from the resource staging server when + it monitors objects in determining when to clean up resource bundles. + + + + spark.kubernetes.authenticate.resourceStagingServer.clientKeyFile + (none) + + Path to the client key file for authenticating against the Kubernetes API server from the resource staging server + when it monitors objects in determining when to clean up resource bundles. The resource staging server must have + credentials that allow it to view API objects in any namespace. + + + + spark.kubernetes.authenticate.resourceStagingServer.clientCertFile + (none) + + Path to the client cert file for authenticating against the Kubernetes API server from the resource staging server + when it monitors objects in determining when to clean up resource bundles. The resource staging server must have + credentials that allow it to view API objects in any namespace. + + + + spark.kubernetes.authenticate.resourceStagingServer.oauthToken + (none) + + OAuth token value for authenticating against the Kubernetes API server from the resource staging server + when it monitors objects in determining when to clean up resource bundles. The resource staging server must have + credentials that allow it to view API objects in any namespace. Note that this cannot be set at the same time as + spark.kubernetes.authenticate.resourceStagingServer.oauthTokenFile. + + + + spark.kubernetes.authenticate.resourceStagingServer.oauthTokenFile + (none) + + File containing the OAuth token to use when authenticating against the against the Kubernetes API server from the + resource staging server, when it monitors objects in determining when to clean up resource bundles. The resource + staging server must have credentials that allow it to view API objects in any namespace. Note that this cannot be + set at the same time as spark.kubernetes.authenticate.resourceStagingServer.oauthToken. + + + + spark.kubernetes.authenticate.resourceStagingServer.useServiceAccountCredentials + true + + Whether or not to use a service account token and a service account CA certificate when the resource staging server + authenticates to Kubernetes. If this is set, interactions with Kubernetes will authenticate using a token located at + /var/run/secrets/kubernetes.io/serviceaccount/token and the CA certificate located at + /var/run/secrets/kubernetes.io/serviceaccount/ca.crt. Note that if + spark.kubernetes.authenticate.resourceStagingServer.oauthTokenFile is set, it takes precedence + over the usage of the service account token file. Also, if + spark.kubernetes.authenticate.resourceStagingServer.caCertFile is set, it takes precedence over using + the service account's CA certificate file. This generally should be set to true (the default value) when the + resource staging server is deployed as a Kubernetes pod, but should be set to false if the resource staging server + is deployed by other means (i.e. when running the staging server process outside of Kubernetes). The resource + staging server must have credentials that allow it to view API objects in any namespace. + + spark.kubernetes.executor.memoryOverhead executorMemory * 0.10, with minimum of 384 @@ -485,6 +539,23 @@ from the other deployment modes. See the [configuration page](configuration.html pairs, where each annotation is in the format key=value. + + spark.kubernetes.executor.labels + (none) + + Custom labels that will be added to the executor pods. This should be a comma-separated list of label key-value + pairs, where each label is in the format key=value. Note that Spark also adds its own labels to the + executor pods for bookkeeping purposes. + + + + spark.kubernetes.executor.annotations + (none) + + Custom annotations that will be added to the executor pods. This should be a comma-separated list of annotation + key-value pairs, where each annotation is in the format key=value. + + spark.kubernetes.driver.pod.name (none) @@ -590,4 +661,19 @@ from the other deployment modes. See the [configuration page](configuration.html Interval between reports of the current Spark job status in cluster mode. - \ No newline at end of file + + spark.kubernetes.docker.image.pullPolicy + IfNotPresent + + Docker image pull policy used when pulling Docker images with Kubernetes. + + + + + +## Current Limitations + +Running Spark on Kubernetes is currently an experimental feature. Some restrictions on the current implementation that +should be lifted in the future include: +* Applications can only run in cluster mode. +* Only Scala and Java applications can be run.