Skip to content

Commit 0428368

Browse files
author
Marcelo Vanzin
committed
[SPARK-22960][K8S] Make build-push-docker-images.sh more dev-friendly.
- Make it possible to build images from a git clone. - Make it easy to use minikube to test things. Also fixed what seemed like a bug: the base image wasn't getting the tag provided in the command line. Adding the tag allows users to use multiple Spark builds in the same kubernetes cluster. Tested by deploying images on minikube and running spark-submit from a dev environment; also by building the images with different tags and verifying "docker images" in minikube. Author: Marcelo Vanzin <[email protected]> Closes #20154 from vanzin/SPARK-22960.
1 parent e288fc8 commit 0428368

File tree

6 files changed

+117
-28
lines changed

6 files changed

+117
-28
lines changed

docs/running-on-kubernetes.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ Kubernetes scheduler that has been added to Spark.
1616
you may setup a test cluster on your local machine using
1717
[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
1818
* We recommend using the latest release of minikube with the DNS addon enabled.
19+
* Be aware that the default minikube configuration is not enough for running Spark applications.
20+
We recommend 3 CPUs and 4g of memory to be able to start a simple Spark application with a single
21+
executor.
1922
* You must have appropriate permissions to list, create, edit and delete
2023
[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources
2124
by running `kubectl auth can-i <list|create|edit|delete> pods`.
@@ -197,7 +200,7 @@ kubectl port-forward <driver-pod-name> 4040:4040
197200

198201
Then, the Spark driver UI can be accessed on `http://localhost:4040`.
199202

200-
### Debugging
203+
### Debugging
201204

202205
There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the
203206
connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there
@@ -215,8 +218,8 @@ If the pod has encountered a runtime error, the status can be probed further usi
215218
kubectl logs <spark-driver-pod>
216219
```
217220

218-
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
219-
application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
221+
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
222+
application, including all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
220223
the Spark application.
221224

222225
## Kubernetes Features

resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# limitations under the License.
1616
#
1717

18-
FROM spark-base
18+
ARG base_image
19+
FROM ${base_image}
1920

2021
# Before building the docker image, first build and make a Spark distribution following
2122
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.

resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# limitations under the License.
1616
#
1717

18-
FROM spark-base
18+
ARG base_image
19+
FROM ${base_image}
1920

2021
# Before building the docker image, first build and make a Spark distribution following
2122
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.

resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# limitations under the License.
1616
#
1717

18-
FROM spark-base
18+
ARG base_image
19+
FROM ${base_image}
1920

2021
# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
2122
# command should be invoked from the top level directory of the Spark distribution. E.g.:

resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717

1818
FROM openjdk:8-alpine
1919

20+
ARG spark_jars
21+
ARG img_path
22+
2023
# Before building the docker image, first build and make a Spark distribution following
2124
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
2225
# If this docker file is being used in the context of building your images from a Spark
@@ -34,11 +37,11 @@ RUN set -ex && \
3437
ln -sv /bin/bash /bin/sh && \
3538
chgrp root /etc/passwd && chmod ug+rw /etc/passwd
3639

37-
COPY jars /opt/spark/jars
40+
COPY ${spark_jars} /opt/spark/jars
3841
COPY bin /opt/spark/bin
3942
COPY sbin /opt/spark/sbin
4043
COPY conf /opt/spark/conf
41-
COPY kubernetes/dockerfiles/spark-base/entrypoint.sh /opt/
44+
COPY ${img_path}/spark-base/entrypoint.sh /opt/
4245

4346
ENV SPARK_HOME /opt/spark
4447

sbin/build-push-docker-images.sh

Lines changed: 100 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,51 +19,131 @@
1919
# This script builds and pushes docker images when run from a release of Spark
2020
# with Kubernetes support.
2121

22-
declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \
23-
[spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \
24-
[spark-init]=kubernetes/dockerfiles/init-container/Dockerfile )
22+
function error {
23+
echo "$@" 1>&2
24+
exit 1
25+
}
26+
27+
# Detect whether this is a git clone or a Spark distribution and adjust paths
28+
# accordingly.
29+
if [ -z "${SPARK_HOME}" ]; then
30+
SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
31+
fi
32+
. "${SPARK_HOME}/bin/load-spark-env.sh"
33+
34+
if [ -f "$SPARK_HOME/RELEASE" ]; then
35+
IMG_PATH="kubernetes/dockerfiles"
36+
SPARK_JARS="jars"
37+
else
38+
IMG_PATH="resource-managers/kubernetes/docker/src/main/dockerfiles"
39+
SPARK_JARS="assembly/target/scala-$SPARK_SCALA_VERSION/jars"
40+
fi
41+
42+
if [ ! -d "$IMG_PATH" ]; then
43+
error "Cannot find docker images. This script must be run from a runnable distribution of Apache Spark."
44+
fi
45+
46+
declare -A path=( [spark-driver]="$IMG_PATH/driver/Dockerfile" \
47+
[spark-executor]="$IMG_PATH/executor/Dockerfile" \
48+
[spark-init]="$IMG_PATH/init-container/Dockerfile" )
49+
50+
function image_ref {
51+
local image="$1"
52+
local add_repo="${2:-1}"
53+
if [ $add_repo = 1 ] && [ -n "$REPO" ]; then
54+
image="$REPO/$image"
55+
fi
56+
if [ -n "$TAG" ]; then
57+
image="$image:$TAG"
58+
fi
59+
echo "$image"
60+
}
2561

2662
function build {
27-
docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .
63+
local base_image="$(image_ref spark-base 0)"
64+
docker build --build-arg "spark_jars=$SPARK_JARS" \
65+
--build-arg "img_path=$IMG_PATH" \
66+
-t "$base_image" \
67+
-f "$IMG_PATH/spark-base/Dockerfile" .
2868
for image in "${!path[@]}"; do
29-
docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
69+
docker build --build-arg "base_image=$base_image" -t "$(image_ref $image)" -f ${path[$image]} .
3070
done
3171
}
3272

33-
3473
function push {
3574
for image in "${!path[@]}"; do
36-
docker push ${REPO}/$image:${TAG}
75+
docker push "$(image_ref $image)"
3776
done
3877
}
3978

4079
function usage {
41-
echo "This script must be run from a runnable distribution of Apache Spark."
42-
echo "Usage: ./sbin/build-push-docker-images.sh -r <repo> -t <tag> build"
43-
echo " ./sbin/build-push-docker-images.sh -r <repo> -t <tag> push"
44-
echo "for example: ./sbin/build-push-docker-images.sh -r docker.io/myrepo -t v2.3.0 push"
80+
cat <<EOF
81+
Usage: $0 [options] [command]
82+
Builds or pushes the built-in Spark Docker images.
83+
84+
Commands:
85+
build Build images.
86+
push Push images to a registry. Requires a repository address to be provided, both
87+
when building and when pushing the images.
88+
89+
Options:
90+
-r repo Repository address.
91+
-t tag Tag to apply to built images, or to identify images to be pushed.
92+
-m Use minikube's Docker daemon.
93+
94+
Using minikube when building images will do so directly into minikube's Docker daemon.
95+
There is no need to push the images into minikube in that case, they'll be automatically
96+
available when running applications inside the minikube cluster.
97+
98+
Check the following documentation for more information on using the minikube Docker daemon:
99+
100+
https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon
101+
102+
Examples:
103+
- Build images in minikube with tag "testing"
104+
$0 -m -t testing build
105+
106+
- Build and push images with tag "v2.3.0" to docker.io/myrepo
107+
$0 -r docker.io/myrepo -t v2.3.0 build
108+
$0 -r docker.io/myrepo -t v2.3.0 push
109+
EOF
45110
}
46111

47112
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
48113
usage
49114
exit 0
50115
fi
51116

52-
while getopts r:t: option
117+
REPO=
118+
TAG=
119+
while getopts mr:t: option
53120
do
54121
case "${option}"
55122
in
56123
r) REPO=${OPTARG};;
57124
t) TAG=${OPTARG};;
125+
m)
126+
if ! which minikube 1>/dev/null; then
127+
error "Cannot find minikube."
128+
fi
129+
eval $(minikube docker-env)
130+
;;
58131
esac
59132
done
60133

61-
if [ -z "$REPO" ] || [ -z "$TAG" ]; then
134+
case "${@: -1}" in
135+
build)
136+
build
137+
;;
138+
push)
139+
if [ -z "$REPO" ]; then
140+
usage
141+
exit 1
142+
fi
143+
push
144+
;;
145+
*)
62146
usage
63-
else
64-
case "${@: -1}" in
65-
build) build;;
66-
push) push;;
67-
*) usage;;
68-
esac
69-
fi
147+
exit 1
148+
;;
149+
esac

0 commit comments

Comments
 (0)