From c0a659a3117674dfbd5c078badd653886c15cc8e Mon Sep 17 00:00:00 2001 From: Yinan Li Date: Fri, 22 Dec 2017 12:33:21 -0800 Subject: [PATCH 1/7] [SPARK-22648][Kubernetes] Update documentation to cover features in #19954 --- docs/running-on-kubernetes.md | 158 +++++++++++++++++++++---------- sbin/build-push-docker-images.sh | 3 +- 2 files changed, 111 insertions(+), 50 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 0048bd90b48a..9d67f4b77901 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -120,6 +120,23 @@ by their appropriate remote URIs. Also, application dependencies can be pre-moun Those dependencies can be added to the classpath by referencing them with `local://` URIs and/or setting the `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. +### Using Remote Dependencies +When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods need a Kubernetes [init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) for downloading the dependencies so the driver and executor containers can use them locally. This requires users to specify the container image for the init-container using the configuration property `spark.kubernetes.initContainer.image`. For example, users simply add the following option to the `spark-submit` command to specify the init-container image: + +``` +--conf spark.kubernetes.initContainer.image= +``` + +## Secret Management +In some cases, a Spark application may need to use some credentials, e.g., for accessing data on a secured HDFS cluster or cloud storage that requires users to provide credentials for authentication. This can be done by mounting the credentials into the driver and executor containers using Kubernetes [secrets](https://kubernetes.io/docs/concepts/configuration/secret/). To mount a user-specified secret into the driver container, users can use the configuration property of the form `spark.kubernetes.driver.secrets.[SecretName]=`. Similarly, the configuration property of the form `spark.kubernetes.executor.secrets.[SecretName]=` can be used to mount a user-specified secret into the executor containers. Note that it is assumed that the secret to be mounted is in the same namespace as that of the driver and executor pods. For example, to mount a secret named `spark-secret` onto the path `/etc/secrets` in both the driver and executor containers, add the following options to the `spark-submit` command: + +``` +--conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets +--conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets +``` + +Note that if an init-container is used, any secret mounted into the driver container will also be mounted into the init-container of the driver. Similarly, any secret mounted into an executor container will also be mounted into the init-container of the executor. + ## Introspection and Debugging These are the different ways in which you can investigate a running/completed Spark application, monitor progress, and @@ -275,7 +292,7 @@ specific to Spark on Kubernetes. (none) Container image to use for the driver. - This is usually of the form `example.com/repo/spark-driver:v1.0.0`. + This is usually of the form example.com/repo/spark-driver:v1.0.0. This configuration is required and must be provided by the user. @@ -284,7 +301,7 @@ specific to Spark on Kubernetes. (none) Container image to use for the executors. - This is usually of the form `example.com/repo/spark-executor:v1.0.0`. + This is usually of the form example.com/repo/spark-executor:v1.0.0. This configuration is required and must be provided by the user. @@ -528,51 +545,94 @@ specific to Spark on Kubernetes. - spark.kubernetes.driver.limit.cores - (none) - - Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod. - - - - spark.kubernetes.executor.limit.cores - (none) - - Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application. - - - - spark.kubernetes.node.selector.[labelKey] - (none) - - Adds to the node selector of the driver pod and executor pods, with key labelKey and the value as the - configuration's value. For example, setting spark.kubernetes.node.selector.identifier to myIdentifier - will result in the driver pod and executors having a node selector with key identifier and value - myIdentifier. Multiple node selector keys can be added by setting multiple configurations with this prefix. - - - - spark.kubernetes.driverEnv.[EnvironmentVariableName] - (none) - - Add the environment variable specified by EnvironmentVariableName to - the Driver process. The user can specify multiple of these to set multiple environment variables. - - - - spark.kubernetes.mountDependencies.jarsDownloadDir - /var/spark-data/spark-jars - - Location to download jars to in the driver and executors. - This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods. - - - - spark.kubernetes.mountDependencies.filesDownloadDir - /var/spark-data/spark-files - - Location to download jars to in the driver and executors. - This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods. - - + spark.kubernetes.driver.limit.cores + (none) + + Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for the driver pod. + + + + spark.kubernetes.executor.limit.cores + (none) + + Specify the hard CPU [limit](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) for each executor pod launched for the Spark Application. + + + + spark.kubernetes.node.selector.[labelKey] + (none) + + Adds to the node selector of the driver pod and executor pods, with key labelKey and the value as the + configuration's value. For example, setting spark.kubernetes.node.selector.identifier to myIdentifier + will result in the driver pod and executors having a node selector with key identifier and value + myIdentifier. Multiple node selector keys can be added by setting multiple configurations with this prefix. + + + + spark.kubernetes.driverEnv.[EnvironmentVariableName] + (none) + + Add the environment variable specified by EnvironmentVariableName to + the Driver process. The user can specify multiple of these to set multiple environment variables. + + + + spark.kubernetes.mountDependencies.jarsDownloadDir + /var/spark-data/spark-jars + + Location to download jars to in the driver and executors. + This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods. + + + + spark.kubernetes.mountDependencies.filesDownloadDir + /var/spark-data/spark-files + + Location to download jars to in the driver and executors. + This directory must be empty and will be mounted as an empty directory volume on the driver and executor pods. + + + + spark.kubernetes.mountDependencies.mountTimeout + 5 minutes + + Timeout before aborting the attempt to download and unpack dependencies from remote locations when initializing + the driver and executor pods. + + + + spark.kubernetes.initContainer.image + (none) + + Container image for the init-container of the driver and executors for downloading dependencies. + This is usually of the form example.com/repo/spark-init:v1.0.0. + This configuration is optional and must be provided by the user if any non-container local dependency is used and + must be downloaded remotely. + + + + spark.kubernetes.initContainer.maxThreadPoolSize + 5 + + Maximum size of the thread pool in the init-container for downloading remote dependencies. + + + + spark.kubernetes.driver.secrets.[SecretName] + (none) + + Add the secret named SecretName to the driver pod on the path specified in the value. For example, + spark.kubernetes.driver.secrets.spark-secret=/etc/secrets. Note that if an init-container is used, + the secret will also be add to the init-container in the driver pod. + + + + spark.kubernetes.executor.secrets.[SecretName] + 5 + + Add the secret named SecretName to the executor pod on the path specified in the value. For example, + spark.kubernetes.executor.secrets.spark-secret=/etc/secrets. Note that if an init-container is used, + the secret will also be add to the init-container in the executor pod. + + \ No newline at end of file diff --git a/sbin/build-push-docker-images.sh b/sbin/build-push-docker-images.sh index 4546e98dc207..b3137598692d 100755 --- a/sbin/build-push-docker-images.sh +++ b/sbin/build-push-docker-images.sh @@ -20,7 +20,8 @@ # with Kubernetes support. declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \ - [spark-executor]=kubernetes/dockerfiles/executor/Dockerfile ) + [spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \ + [spark-init]=kubernetes/dockerfiles/init-container/Dockerfile ) function build { docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile . From fbb211214499997fe131358042ce05454f88d6fb Mon Sep 17 00:00:00 2001 From: Yinan Li Date: Fri, 22 Dec 2017 14:47:46 -0800 Subject: [PATCH 2/7] Addressed comments --- docs/running-on-kubernetes.md | 68 +++++++++++++++++++++++++---------- 1 file changed, 49 insertions(+), 19 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 9d67f4b77901..7e185619c026 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -121,21 +121,55 @@ Those dependencies can be added to the classpath by referencing them with `local `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. ### Using Remote Dependencies -When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods need a Kubernetes [init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) for downloading the dependencies so the driver and executor containers can use them locally. This requires users to specify the container image for the init-container using the configuration property `spark.kubernetes.initContainer.image`. For example, users simply add the following option to the `spark-submit` command to specify the init-container image: +When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods +need a Kubernetes [init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) for downloading +the dependencies so the driver and executor containers can use them locally. This requires users to specify the container +image for the init-container using the configuration property `spark.kubernetes.initContainer.image`. For example, users +simply add the following option to the `spark-submit` command to specify the init-container image: ``` --conf spark.kubernetes.initContainer.image= ``` +The init-container handles remote dependencies specified in `spark.jars` (or the `--jars` option of `spark-submit`) and +`spark.files` (or the `--files` option of `spark-submit`). It also handles remotely hosted main application resources, e.g., +the main application jar. The following shows an example of using remote dependencies with the `spark-submit` command: + +```bash +$ bin/spark-submit \ + --master k8s://https://: \ + --deploy-mode cluster \ + --name spark-pi \ + --class org.apache.spark.examples.SparkPi \ + --jars https://path/to/dependency1.jar,https://path/to/dependency2.jar + --files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2 + --conf spark.executor.instances=5 \ + --conf spark.kubernetes.driver.docker.image= \ + --conf spark.kubernetes.executor.docker.image= \ + --conf spark.kubernetes.initContainer.image= + https://path/to/examples.jar +``` + ## Secret Management -In some cases, a Spark application may need to use some credentials, e.g., for accessing data on a secured HDFS cluster or cloud storage that requires users to provide credentials for authentication. This can be done by mounting the credentials into the driver and executor containers using Kubernetes [secrets](https://kubernetes.io/docs/concepts/configuration/secret/). To mount a user-specified secret into the driver container, users can use the configuration property of the form `spark.kubernetes.driver.secrets.[SecretName]=`. Similarly, the configuration property of the form `spark.kubernetes.executor.secrets.[SecretName]=` can be used to mount a user-specified secret into the executor containers. Note that it is assumed that the secret to be mounted is in the same namespace as that of the driver and executor pods. For example, to mount a secret named `spark-secret` onto the path `/etc/secrets` in both the driver and executor containers, add the following options to the `spark-submit` command: +In some cases, a Spark application may need to use some credentials, e.g., for accessing data on a secured HDFS cluster +or cloud storage that requires users to provide credentials for authentication. This can be done by mounting the +credentials into the driver and executor containers using +Kubernetes [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/). To mount a user-specified secret into +the driver container, users can use the configuration property of the form +`spark.kubernetes.driver.secrets.[SecretName]=`. Similarly, the configuration property of the form +`spark.kubernetes.executor.secrets.[SecretName]=` can be used to mount a user-specified secret into the +executor containers. Note that it is assumed that the secret to be mounted is in the same namespace as that of the driver +and executor pods. For example, to mount a secret named `spark-secret` onto the path `/etc/secrets` in both the driver +and executor containers, add the following options to the `spark-submit` command: ``` --conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets --conf spark.kubernetes.executor.secrets.spark-secret=/etc/secrets ``` -Note that if an init-container is used, any secret mounted into the driver container will also be mounted into the init-container of the driver. Similarly, any secret mounted into an executor container will also be mounted into the init-container of the executor. +Note that if an init-container is used, any secret mounted into the driver container will also be mounted into the +init-container of the driver. Similarly, any secret mounted into an executor container will also be mounted into the +init-container of the executor. ## Introspection and Debugging @@ -593,46 +627,42 @@ specific to Spark on Kubernetes. - spark.kubernetes.mountDependencies.mountTimeout + spark.kubernetes.mountDependencies.timeout 5 minutes - Timeout before aborting the attempt to download and unpack dependencies from remote locations when initializing - the driver and executor pods. + Timeout before aborting the attempt to download and unpack dependencies from remote locations into the driver and executor pods. - spark.kubernetes.initContainer.image - (none) + spark.kubernetes.mountDependencies.maxThreadPoolSize + 5 - Container image for the init-container of the driver and executors for downloading dependencies. - This is usually of the form example.com/repo/spark-init:v1.0.0. - This configuration is optional and must be provided by the user if any non-container local dependency is used and - must be downloaded remotely. + Maximum size of the thread pool for downloading remote dependencies into the driver and executor pods. - spark.kubernetes.initContainer.maxThreadPoolSize - 5 + spark.kubernetes.initContainer.image + (none) - Maximum size of the thread pool in the init-container for downloading remote dependencies. + Container image for the init-container of the driver and executors for downloading dependencies. This is usually of the form example.com/repo/spark-init:v1.0.0. This configuration is optional and must be provided by the user if any non-container local dependency is used and must be downloaded remotely. spark.kubernetes.driver.secrets.[SecretName] (none) - Add the secret named SecretName to the driver pod on the path specified in the value. For example, + Add the Kubernetes Secret named SecretName to the driver pod on the path specified in the value. For example, spark.kubernetes.driver.secrets.spark-secret=/etc/secrets. Note that if an init-container is used, - the secret will also be add to the init-container in the driver pod. + the secret will also be added to the init-container in the driver pod. spark.kubernetes.executor.secrets.[SecretName] 5 - Add the secret named SecretName to the executor pod on the path specified in the value. For example, + Add the Kubernetes Secret named SecretName to the executor pod on the path specified in the value. For example, spark.kubernetes.executor.secrets.spark-secret=/etc/secrets. Note that if an init-container is used, - the secret will also be add to the init-container in the executor pod. + the secret will also be added to the init-container in the executor pod. \ No newline at end of file From f23bf0fdf21f21224895a5c35e0d95956a29abf9 Mon Sep 17 00:00:00 2001 From: Yinan Li Date: Fri, 22 Dec 2017 19:07:43 -0800 Subject: [PATCH 3/7] Addressed more comments --- docs/running-on-kubernetes.md | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 7e185619c026..6adca3b9fce2 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -151,16 +151,13 @@ $ bin/spark-submit \ ``` ## Secret Management -In some cases, a Spark application may need to use some credentials, e.g., for accessing data on a secured HDFS cluster -or cloud storage that requires users to provide credentials for authentication. This can be done by mounting the -credentials into the driver and executor containers using -Kubernetes [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/). To mount a user-specified secret into -the driver container, users can use the configuration property of the form -`spark.kubernetes.driver.secrets.[SecretName]=`. Similarly, the configuration property of the form -`spark.kubernetes.executor.secrets.[SecretName]=` can be used to mount a user-specified secret into the -executor containers. Note that it is assumed that the secret to be mounted is in the same namespace as that of the driver -and executor pods. For example, to mount a secret named `spark-secret` onto the path `/etc/secrets` in both the driver -and executor containers, add the following options to the `spark-submit` command: +Kubernetes [Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) can be used to provide credentials for a +Spark application to access secured services. To mount a user-specified secret into the driver container, users can use +the configuration property of the form `spark.kubernetes.driver.secrets.[SecretName]=`. Similarly, the +configuration property of the form `spark.kubernetes.executor.secrets.[SecretName]=` can be used to mount a +user-specified secret into the executor containers. Note that it is assumed that the secret to be mounted is in the same +namespace as that of the driver and executor pods. For example, to mount a secret named `spark-secret` onto the path +`/etc/secrets` in both the driver and executor containers, add the following options to the `spark-submit` command: ``` --conf spark.kubernetes.driver.secrets.spark-secret=/etc/secrets @@ -634,10 +631,10 @@ specific to Spark on Kubernetes. - spark.kubernetes.mountDependencies.maxThreadPoolSize + spark.kubernetes.mountDependencies.maxSimultaneousDownloads 5 - Maximum size of the thread pool for downloading remote dependencies into the driver and executor pods. + Maximum number of remote dependencies to download simultaneously in a driver or executor pod. From 818abaf46d8cb4d92f9940e2b59ad6cf27e5da44 Mon Sep 17 00:00:00 2001 From: Yinan Li Date: Mon, 25 Dec 2017 10:03:57 -0800 Subject: [PATCH 4/7] Update the unit of one configuration property --- docs/running-on-kubernetes.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 6adca3b9fce2..083f8541e732 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -625,9 +625,10 @@ specific to Spark on Kubernetes. spark.kubernetes.mountDependencies.timeout - 5 minutes + 300 seconds - Timeout before aborting the attempt to download and unpack dependencies from remote locations into the driver and executor pods. + Timeout in seconds before aborting the attempt to download and unpack dependencies from remote locations into + the driver and executor pods. From 08486e83266f4f0fe2b3505cceeda9ca00964733 Mon Sep 17 00:00:00 2001 From: Yinan Li Date: Tue, 26 Dec 2017 08:59:31 -0800 Subject: [PATCH 5/7] Fixed the default value of a config property --- docs/running-on-kubernetes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 083f8541e732..8e2171da4927 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -656,7 +656,7 @@ specific to Spark on Kubernetes. spark.kubernetes.executor.secrets.[SecretName] - 5 + (none) Add the Kubernetes Secret named SecretName to the executor pod on the path specified in the value. For example, spark.kubernetes.executor.secrets.spark-secret=/etc/secrets. Note that if an init-container is used, From f4b5c03a7d1947cceafcf2ac16ddb0318778b387 Mon Sep 17 00:00:00 2001 From: Yinan Li Date: Tue, 26 Dec 2017 22:41:25 -0800 Subject: [PATCH 6/7] Addressed more comments --- docs/running-on-kubernetes.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 8e2171da4927..6f0d56a04ad2 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -76,8 +76,8 @@ $ bin/spark-submit \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=5 \ - --conf spark.kubernetes.driver.docker.image= \ - --conf spark.kubernetes.executor.docker.image= \ + --conf spark.kubernetes.driver.container.image= \ + --conf spark.kubernetes.executor.container.image= \ local:///path/to/examples.jar {% endhighlight %} @@ -144,8 +144,8 @@ $ bin/spark-submit \ --jars https://path/to/dependency1.jar,https://path/to/dependency2.jar --files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2 --conf spark.executor.instances=5 \ - --conf spark.kubernetes.driver.docker.image= \ - --conf spark.kubernetes.executor.docker.image= \ + --conf spark.kubernetes.driver.container.image= \ + --conf spark.kubernetes.executor.container.image= \ --conf spark.kubernetes.initContainer.image= https://path/to/examples.jar ``` @@ -625,7 +625,7 @@ specific to Spark on Kubernetes. spark.kubernetes.mountDependencies.timeout - 300 seconds + 300s Timeout in seconds before aborting the attempt to download and unpack dependencies from remote locations into the driver and executor pods. From 453a3db9a610b656020a29406f12d1bf7479a7eb Mon Sep 17 00:00:00 2001 From: Yinan Li Date: Wed, 27 Dec 2017 11:16:13 -0800 Subject: [PATCH 7/7] Fixed some formatting --- docs/running-on-kubernetes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 6f0d56a04ad2..e491329136a3 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -69,7 +69,7 @@ building using the supplied script, or manually. To launch Spark Pi in cluster mode, -{% highlight bash %} +```bash $ bin/spark-submit \ --master k8s://https://: \ --deploy-mode cluster \ @@ -79,7 +79,7 @@ $ bin/spark-submit \ --conf spark.kubernetes.driver.container.image= \ --conf spark.kubernetes.executor.container.image= \ local:///path/to/examples.jar -{% endhighlight %} +``` The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting `spark.master` in the application's configuration, must be a URL with the format `k8s://`. Prefixing the