Skip to content

Commit bfad831

Browse files
author
Marcelo Vanzin
committed
[SPARK-22994][k8s] Use a single image for all Spark containers.
This change allows a user to submit a Spark application on kubernetes having to provide a single image, instead of one image for each type of container. The image's entry point now takes an extra argument that identifies the process that is being started. The configuration still allows the user to provide different images for each container type if they so desire. On top of that, the entry point was simplified a bit to share more code; mainly, the same env variable is used to propagate the user-defined classpath to the different containers. Aside from being modified to match the new behavior, the 'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh' to more closely match its purpose; the old name was a little awkward and now also not entirely correct, since there is a single image. It was also moved to 'bin' since it's not necessarily an admin tool. Docs and scripts have been updated to match the new behavior.
1 parent eed82a0 commit bfad831

File tree

17 files changed

+192
-222
lines changed

17 files changed

+192
-222
lines changed

sbin/build-push-docker-images.sh renamed to bin/docker-image-tool.sh

Lines changed: 39 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -24,29 +24,11 @@ function error {
2424
exit 1
2525
}
2626

27-
# Detect whether this is a git clone or a Spark distribution and adjust paths
28-
# accordingly.
2927
if [ -z "${SPARK_HOME}" ]; then
3028
SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
3129
fi
3230
. "${SPARK_HOME}/bin/load-spark-env.sh"
3331

34-
if [ -f "$SPARK_HOME/RELEASE" ]; then
35-
IMG_PATH="kubernetes/dockerfiles"
36-
SPARK_JARS="jars"
37-
else
38-
IMG_PATH="resource-managers/kubernetes/docker/src/main/dockerfiles"
39-
SPARK_JARS="assembly/target/scala-$SPARK_SCALA_VERSION/jars"
40-
fi
41-
42-
if [ ! -d "$IMG_PATH" ]; then
43-
error "Cannot find docker images. This script must be run from a runnable distribution of Apache Spark."
44-
fi
45-
46-
declare -A path=( [spark-driver]="$IMG_PATH/driver/Dockerfile" \
47-
[spark-executor]="$IMG_PATH/executor/Dockerfile" \
48-
[spark-init]="$IMG_PATH/init-container/Dockerfile" )
49-
5032
function image_ref {
5133
local image="$1"
5234
local add_repo="${2:-1}"
@@ -60,35 +42,53 @@ function image_ref {
6042
}
6143

6244
function build {
63-
docker build \
64-
--build-arg "spark_jars=$SPARK_JARS" \
65-
--build-arg "img_path=$IMG_PATH" \
66-
-t spark-base \
67-
-f "$IMG_PATH/spark-base/Dockerfile" .
68-
for image in "${!path[@]}"; do
69-
docker build -t "$(image_ref $image)" -f ${path[$image]} .
70-
done
45+
# Detect whether this is a git clone or a Spark distribution and adjust paths
46+
# accordingly.
47+
local BUILD_ARGS
48+
local IMG_PATH
49+
50+
if [ ! -f "$SPARK_HOME/RELEASE" ]; then
51+
IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
52+
BUILD_ARGS=(
53+
--build-arg
54+
img_path=$IMG_PATH
55+
--build-arg
56+
spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
57+
)
58+
else
59+
# Not passed as an argument to docker, but used to validate the Spark directory.
60+
IMG_PATH="kubernetes/dockerfiles"
61+
fi
62+
63+
local DOCKERFILE=${DOCKERFILE:-"$IMG_PATH/spark/Dockerfile"}
64+
65+
if [ ! -d "$IMG_PATH" ]; then
66+
error "Cannot find docker images. This script must be run from a runnable distribution of Apache Spark."
67+
fi
68+
69+
docker build "${BUILD_ARGS[@]}" \
70+
-t $(image_ref spark) \
71+
-f "$DOCKERFILE" .
7172
}
7273

7374
function push {
74-
for image in "${!path[@]}"; do
75-
docker push "$(image_ref $image)"
76-
done
75+
docker push "$(image_ref spark)"
7776
}
7877

7978
function usage {
8079
cat <<EOF
8180
Usage: $0 [options] [command]
82-
Builds or pushes the built-in Spark Docker images.
81+
Builds or pushes the built-in Spark Docker image.
8382
8483
Commands:
85-
build Build images.
86-
push Push images to a registry. Requires a repository address to be provided, both
87-
when building and when pushing the images.
84+
build Build image.
85+
push Push a pre-built image to a registry. Requires a repository address to be provided,
86+
both when building and when pushing the image.
8887
8988
Options:
89+
-f file Dockerfile to build. By default builds the Dockerfile shipped with Spark.
9090
-r repo Repository address.
91-
-t tag Tag to apply to built images, or to identify images to be pushed.
91+
-t tag Tag to apply to the built image, or to identify the image to be pushed.
9292
-m Use minikube's Docker daemon.
9393
9494
Using minikube when building images will do so directly into minikube's Docker daemon.
@@ -100,10 +100,10 @@ Check the following documentation for more information on using the minikube Doc
100100
https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon
101101
102102
Examples:
103-
- Build images in minikube with tag "testing"
103+
- Build image in minikube with tag "testing"
104104
$0 -m -t testing build
105105
106-
- Build and push images with tag "v2.3.0" to docker.io/myrepo
106+
- Build and push image with tag "v2.3.0" to docker.io/myrepo
107107
$0 -r docker.io/myrepo -t v2.3.0 build
108108
$0 -r docker.io/myrepo -t v2.3.0 push
109109
EOF
@@ -116,10 +116,12 @@ fi
116116

117117
REPO=
118118
TAG=
119-
while getopts mr:t: option
119+
DOCKERFILE=
120+
while getopts f:mr:t: option
120121
do
121122
case "${option}"
122123
in
124+
f) DOCKERFILE=${OPTARG};;
123125
r) REPO=${OPTARG};;
124126
t) TAG=${OPTARG};;
125127
m)

docs/running-on-kubernetes.md

Lines changed: 23 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -56,14 +56,13 @@ be run in a container runtime environment that Kubernetes supports. Docker is a
5656
frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized
5757
and built for your usage.
5858

59-
You may build these docker images from sources.
60-
There is a script, `sbin/build-push-docker-images.sh` that you can use to build and push
61-
customized Spark distribution images consisting of all the above components.
59+
You may build these docker images from sources. Spark ships with a `bin/docker-image-tool.sh` script
60+
that can be used to build and publish the Spark Docker image to use with the Kubernetes backend.
6261

6362
Example usage is:
6463

65-
./sbin/build-push-docker-images.sh -r <repo> -t my-tag build
66-
./sbin/build-push-docker-images.sh -r <repo> -t my-tag push
64+
./bin/docker-image-tool.sh -r <repo> -t my-tag build
65+
./bin/docker-image-tool.sh -r <repo> -t my-tag push
6766

6867
Docker files are under the `kubernetes/dockerfiles/` directory and can be customized further before
6968
building using the supplied script, or manually.
@@ -79,8 +78,7 @@ $ bin/spark-submit \
7978
--name spark-pi \
8079
--class org.apache.spark.examples.SparkPi \
8180
--conf spark.executor.instances=5 \
82-
--conf spark.kubernetes.driver.container.image=<driver-image> \
83-
--conf spark.kubernetes.executor.container.image=<executor-image> \
81+
--conf spark.kubernetes.container.image=<spark-image> \
8482
local:///path/to/examples.jar
8583
```
8684

@@ -126,13 +124,7 @@ Those dependencies can be added to the classpath by referencing them with `local
126124
### Using Remote Dependencies
127125
When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods
128126
need a Kubernetes [init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) for downloading
129-
the dependencies so the driver and executor containers can use them locally. This requires users to specify the container
130-
image for the init-container using the configuration property `spark.kubernetes.initContainer.image`. For example, users
131-
simply add the following option to the `spark-submit` command to specify the init-container image:
132-
133-
```
134-
--conf spark.kubernetes.initContainer.image=<init-container image>
135-
```
127+
the dependencies so the driver and executor containers can use them locally.
136128

137129
The init-container handles remote dependencies specified in `spark.jars` (or the `--jars` option of `spark-submit`) and
138130
`spark.files` (or the `--files` option of `spark-submit`). It also handles remotely hosted main application resources, e.g.,
@@ -147,9 +139,7 @@ $ bin/spark-submit \
147139
--jars https://path/to/dependency1.jar,https://path/to/dependency2.jar
148140
--files hdfs://host:port/path/to/file1,hdfs://host:port/path/to/file2
149141
--conf spark.executor.instances=5 \
150-
--conf spark.kubernetes.driver.container.image=<driver-image> \
151-
--conf spark.kubernetes.executor.container.image=<executor-image> \
152-
--conf spark.kubernetes.initContainer.image=<init-container image>
142+
--conf spark.kubernetes.container.image=<spark-image> \
153143
https://path/to/examples.jar
154144
```
155145

@@ -322,21 +312,27 @@ specific to Spark on Kubernetes.
322312
</td>
323313
</tr>
324314
<tr>
325-
<td><code>spark.kubernetes.driver.container.image</code></td>
315+
<td><code>spark.kubernetes.container.image</code></td>
326316
<td><code>(none)</code></td>
327317
<td>
328-
Container image to use for the driver.
329-
This is usually of the form <code>example.com/repo/spark-driver:v1.0.0</code>.
330-
This configuration is required and must be provided by the user.
318+
Container image to use for the Spark application.
319+
This is usually of the form <code>example.com/repo/spark:v1.0.0</code>.
320+
This configuration is required and must be provided by the user, unless explicit
321+
images are provided for each different container type.
322+
</td>
323+
</tr>
324+
<tr>
325+
<td><code>spark.kubernetes.driver.container.image</code></td>
326+
<td><code>(value of spark.kubernetes.container.image)</code></td>
327+
<td>
328+
Custom container image to use for the driver.
331329
</td>
332330
</tr>
333331
<tr>
334332
<td><code>spark.kubernetes.executor.container.image</code></td>
335-
<td><code>(none)</code></td>
333+
<td><code>(value of spark.kubernetes.container.image)</code></td>
336334
<td>
337-
Container image to use for the executors.
338-
This is usually of the form <code>example.com/repo/spark-executor:v1.0.0</code>.
339-
This configuration is required and must be provided by the user.
335+
Custom container image to use for executors.
340336
</td>
341337
</tr>
342338
<tr>
@@ -643,9 +639,9 @@ specific to Spark on Kubernetes.
643639
</tr>
644640
<tr>
645641
<td><code>spark.kubernetes.initContainer.image</code></td>
646-
<td>(none)</td>
642+
<td><code>(value of spark.kubernetes.container.image)</code></td>
647643
<td>
648-
Container image for the <a href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/">init-container</a> of the driver and executors for downloading dependencies. This is usually of the form <code>example.com/repo/spark-init:v1.0.0</code>. This configuration is optional and must be provided by the user if any non-container local dependency is used and must be downloaded remotely.
644+
Custom container image for the init container of both driver and executors.
649645
</td>
650646
</tr>
651647
<tr>

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,23 @@ private[spark] object Config extends Logging {
2929
.stringConf
3030
.createWithDefault("default")
3131

32+
val CONTAINER_IMAGE =
33+
ConfigBuilder("spark.kubernetes.container.image")
34+
.doc("Container image to use for Spark containers. Individual container types " +
35+
"(e.g. driver or executor) can also be configured to use different images if desired, " +
36+
"by setting the container-specific image name.")
37+
.stringConf
38+
.createOptional
39+
3240
val DRIVER_CONTAINER_IMAGE =
3341
ConfigBuilder("spark.kubernetes.driver.container.image")
3442
.doc("Container image to use for the driver.")
35-
.stringConf
36-
.createOptional
43+
.fallbackConf(CONTAINER_IMAGE)
3744

3845
val EXECUTOR_CONTAINER_IMAGE =
3946
ConfigBuilder("spark.kubernetes.executor.container.image")
4047
.doc("Container image to use for the executors.")
41-
.stringConf
42-
.createOptional
48+
.fallbackConf(CONTAINER_IMAGE)
4349

4450
val CONTAINER_IMAGE_PULL_POLICY =
4551
ConfigBuilder("spark.kubernetes.container.image.pullPolicy")
@@ -148,8 +154,7 @@ private[spark] object Config extends Logging {
148154
val INIT_CONTAINER_IMAGE =
149155
ConfigBuilder("spark.kubernetes.initContainer.image")
150156
.doc("Image for the driver and executor's init-container for downloading dependencies.")
151-
.stringConf
152-
.createOptional
157+
.fallbackConf(CONTAINER_IMAGE)
153158

154159
val INIT_CONTAINER_MOUNT_TIMEOUT =
155160
ConfigBuilder("spark.kubernetes.mountDependencies.timeout")

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,9 @@ private[spark] object Constants {
6060
val ENV_APPLICATION_ID = "SPARK_APPLICATION_ID"
6161
val ENV_EXECUTOR_ID = "SPARK_EXECUTOR_ID"
6262
val ENV_EXECUTOR_POD_IP = "SPARK_EXECUTOR_POD_IP"
63-
val ENV_EXECUTOR_EXTRA_CLASSPATH = "SPARK_EXECUTOR_EXTRA_CLASSPATH"
6463
val ENV_MOUNTED_CLASSPATH = "SPARK_MOUNTED_CLASSPATH"
6564
val ENV_JAVA_OPT_PREFIX = "SPARK_JAVA_OPT_"
66-
val ENV_SUBMIT_EXTRA_CLASSPATH = "SPARK_SUBMIT_EXTRA_CLASSPATH"
65+
val ENV_CLASSPATH = "SPARK_CLASSPATH"
6766
val ENV_DRIVER_MAIN_CLASS = "SPARK_DRIVER_CLASS"
6867
val ENV_DRIVER_ARGS = "SPARK_DRIVER_ARGS"
6968
val ENV_DRIVER_JAVA_OPTS = "SPARK_DRIVER_JAVA_OPTS"

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ private[spark] class InitContainerBootstrap(
7777
.withMountPath(INIT_CONTAINER_PROPERTIES_FILE_DIR)
7878
.endVolumeMount()
7979
.addToVolumeMounts(sharedVolumeMounts: _*)
80+
.addToArgs("init")
8081
.addToArgs(INIT_CONTAINER_PROPERTIES_FILE_PATH)
8182
.build()
8283

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ private[spark] class BasicDriverConfigurationStep(
6666
override def configureDriver(driverSpec: KubernetesDriverSpec): KubernetesDriverSpec = {
6767
val driverExtraClasspathEnv = driverExtraClasspath.map { classPath =>
6868
new EnvVarBuilder()
69-
.withName(ENV_SUBMIT_EXTRA_CLASSPATH)
69+
.withName(ENV_CLASSPATH)
7070
.withValue(classPath)
7171
.build()
7272
}
@@ -133,6 +133,7 @@ private[spark] class BasicDriverConfigurationStep(
133133
.addToLimits("memory", driverMemoryLimitQuantity)
134134
.addToLimits(maybeCpuLimitQuantity.toMap.asJava)
135135
.endResources()
136+
.addToArgs("driver")
136137
.build()
137138

138139
val baseDriverPod = new PodBuilder(driverSpec.driverPod)

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ private[spark] class ExecutorPodFactory(
126126
.build()
127127
val executorExtraClasspathEnv = executorExtraClasspath.map { cp =>
128128
new EnvVarBuilder()
129-
.withName(ENV_EXECUTOR_EXTRA_CLASSPATH)
129+
.withName(ENV_CLASSPATH)
130130
.withValue(cp)
131131
.build()
132132
}
@@ -178,6 +178,7 @@ private[spark] class ExecutorPodFactory(
178178
.endResources()
179179
.addAllToEnv(executorEnv.asJava)
180180
.withPorts(requiredPorts.asJava)
181+
.addToArgs("executor")
181182
.build()
182183

183184
val executorPod = new PodBuilder()

resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,7 @@ class DriverConfigOrchestratorSuite extends SparkFunSuite {
3434
private val SECRET_MOUNT_PATH = "/etc/secrets/driver"
3535

3636
test("Base submission steps with a main app resource.") {
37-
val sparkConf = new SparkConf(false)
38-
.set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
37+
val sparkConf = new SparkConf(false).set(CONTAINER_IMAGE, DRIVER_IMAGE)
3938
val mainAppResource = JavaMainAppResource("local:///var/apps/jars/main.jar")
4039
val orchestrator = new DriverConfigOrchestrator(
4140
APP_ID,
@@ -55,8 +54,7 @@ class DriverConfigOrchestratorSuite extends SparkFunSuite {
5554
}
5655

5756
test("Base submission steps without a main app resource.") {
58-
val sparkConf = new SparkConf(false)
59-
.set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
57+
val sparkConf = new SparkConf(false).set(CONTAINER_IMAGE, DRIVER_IMAGE)
6058
val orchestrator = new DriverConfigOrchestrator(
6159
APP_ID,
6260
LAUNCH_TIME,
@@ -75,8 +73,8 @@ class DriverConfigOrchestratorSuite extends SparkFunSuite {
7573

7674
test("Submission steps with an init-container.") {
7775
val sparkConf = new SparkConf(false)
78-
.set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
79-
.set(INIT_CONTAINER_IMAGE, IC_IMAGE)
76+
.set(CONTAINER_IMAGE, DRIVER_IMAGE)
77+
.set(INIT_CONTAINER_IMAGE.key, IC_IMAGE)
8078
.set("spark.jars", "hdfs://localhost:9000/var/apps/jars/jar1.jar")
8179
val mainAppResource = JavaMainAppResource("local:///var/apps/jars/main.jar")
8280
val orchestrator = new DriverConfigOrchestrator(
@@ -98,7 +96,7 @@ class DriverConfigOrchestratorSuite extends SparkFunSuite {
9896

9997
test("Submission steps with driver secrets to mount") {
10098
val sparkConf = new SparkConf(false)
101-
.set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
99+
.set(CONTAINER_IMAGE, DRIVER_IMAGE)
102100
.set(s"$KUBERNETES_DRIVER_SECRETS_PREFIX$SECRET_FOO", SECRET_MOUNT_PATH)
103101
.set(s"$KUBERNETES_DRIVER_SECRETS_PREFIX$SECRET_BAR", SECRET_MOUNT_PATH)
104102
val mainAppResource = JavaMainAppResource("local:///var/apps/jars/main.jar")

resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStepSuite.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ class BasicDriverConfigurationStepSuite extends SparkFunSuite {
4747
.set(KUBERNETES_DRIVER_LIMIT_CORES, "4")
4848
.set(org.apache.spark.internal.config.DRIVER_MEMORY.key, "256M")
4949
.set(org.apache.spark.internal.config.DRIVER_MEMORY_OVERHEAD, 200L)
50-
.set(DRIVER_CONTAINER_IMAGE, "spark-driver:latest")
50+
.set(CONTAINER_IMAGE, "spark-driver:latest")
5151
.set(s"$KUBERNETES_DRIVER_ANNOTATION_PREFIX$CUSTOM_ANNOTATION_KEY", CUSTOM_ANNOTATION_VALUE)
5252
.set(s"$KUBERNETES_DRIVER_ENV_KEY$DRIVER_CUSTOM_ENV_KEY1", "customDriverEnv1")
5353
.set(s"$KUBERNETES_DRIVER_ENV_KEY$DRIVER_CUSTOM_ENV_KEY2", "customDriverEnv2")
@@ -79,7 +79,7 @@ class BasicDriverConfigurationStepSuite extends SparkFunSuite {
7979
.asScala
8080
.map(env => (env.getName, env.getValue))
8181
.toMap
82-
assert(envs(ENV_SUBMIT_EXTRA_CLASSPATH) === "/opt/spark/spark-examples.jar")
82+
assert(envs(ENV_CLASSPATH) === "/opt/spark/spark-examples.jar")
8383
assert(envs(ENV_DRIVER_MEMORY) === "256M")
8484
assert(envs(ENV_DRIVER_MAIN_CLASS) === MAIN_CLASS)
8585
assert(envs(ENV_DRIVER_ARGS) === "arg1 arg2 \"arg 3\"")

resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/steps/initcontainer/InitContainerConfigOrchestratorSuite.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class InitContainerConfigOrchestratorSuite extends SparkFunSuite {
4040

4141
test("including basic configuration step") {
4242
val sparkConf = new SparkConf(true)
43-
.set(INIT_CONTAINER_IMAGE, DOCKER_IMAGE)
43+
.set(CONTAINER_IMAGE, DOCKER_IMAGE)
4444
.set(s"$KUBERNETES_DRIVER_LABEL_PREFIX$CUSTOM_LABEL_KEY", CUSTOM_LABEL_VALUE)
4545

4646
val orchestrator = new InitContainerConfigOrchestrator(
@@ -59,7 +59,7 @@ class InitContainerConfigOrchestratorSuite extends SparkFunSuite {
5959

6060
test("including step to mount user-specified secrets") {
6161
val sparkConf = new SparkConf(false)
62-
.set(INIT_CONTAINER_IMAGE, DOCKER_IMAGE)
62+
.set(CONTAINER_IMAGE, DOCKER_IMAGE)
6363
.set(s"$KUBERNETES_DRIVER_SECRETS_PREFIX$SECRET_FOO", SECRET_MOUNT_PATH)
6464
.set(s"$KUBERNETES_DRIVER_SECRETS_PREFIX$SECRET_BAR", SECRET_MOUNT_PATH)
6565

0 commit comments

Comments
 (0)