Skip to content

Commit 2ce48f9

Browse files
author
Marcelo Vanzin
committed
[SPARK-22960][k8s] Make build-push-docker-images.sh more dev-friendly.
- Make it possible to build images from a git clone. - Make it easy to use minikube to test things. Also fixed what seemed like a bug: the base image wasn't getting the tag provided in the command line. Adding the tag allows users to use multiple Spark builds in the same kubernetes cluster. Tested by deploying images on minikube and running spark-submit from a dev environment; also by building the images with different tags and verifying "docker images" in minikube.
1 parent 9a2b65a commit 2ce48f9

File tree

6 files changed

+103
-28
lines changed

6 files changed

+103
-28
lines changed

docs/running-on-kubernetes.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ Kubernetes scheduler that has been added to Spark.
1616
you may setup a test cluster on your local machine using
1717
[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
1818
* We recommend using the latest release of minikube with the DNS addon enabled.
19+
* Be aware that the default minikube configuration is not enough for running Spark applications.
20+
You will need to increase the available memory and number of CPUs.
1921
* You must have appropriate permissions to list, create, edit and delete
2022
[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources
2123
by running `kubectl auth can-i <list|create|edit|delete> pods`.
@@ -197,7 +199,7 @@ kubectl port-forward <driver-pod-name> 4040:4040
197199

198200
Then, the Spark driver UI can be accessed on `http://localhost:4040`.
199201

200-
### Debugging
202+
### Debugging
201203

202204
There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the
203205
connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there
@@ -215,8 +217,8 @@ If the pod has encountered a runtime error, the status can be probed further usi
215217
kubectl logs <spark-driver-pod>
216218
```
217219

218-
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
219-
application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
220+
Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
221+
application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
220222
the Spark application.
221223

222224
## Kubernetes Features

resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# limitations under the License.
1616
#
1717

18-
FROM spark-base
18+
ARG base_image
19+
FROM ${base_image}
1920

2021
# Before building the docker image, first build and make a Spark distribution following
2122
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.

resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# limitations under the License.
1616
#
1717

18-
FROM spark-base
18+
ARG base_image
19+
FROM ${base_image}
1920

2021
# Before building the docker image, first build and make a Spark distribution following
2122
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.

resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# limitations under the License.
1616
#
1717

18-
FROM spark-base
18+
ARG base_image
19+
FROM ${base_image}
1920

2021
# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
2122
# command should be invoked from the top level directory of the Spark distribution. E.g.:

resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717

1818
FROM openjdk:8-alpine
1919

20+
ARG spark_jars
21+
ARG img_path
22+
2023
# Before building the docker image, first build and make a Spark distribution following
2124
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
2225
# If this docker file is being used in the context of building your images from a Spark
@@ -34,11 +37,11 @@ RUN set -ex && \
3437
ln -sv /bin/bash /bin/sh && \
3538
chgrp root /etc/passwd && chmod ug+rw /etc/passwd
3639

37-
COPY jars /opt/spark/jars
40+
COPY ${spark_jars} /opt/spark/jars
3841
COPY bin /opt/spark/bin
3942
COPY sbin /opt/spark/sbin
4043
COPY conf /opt/spark/conf
41-
COPY kubernetes/dockerfiles/spark-base/entrypoint.sh /opt/
44+
COPY ${img_path}/spark-base/entrypoint.sh /opt/
4245

4346
ENV SPARK_HOME /opt/spark
4447

sbin/build-push-docker-images.sh

Lines changed: 87 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,51 +19,118 @@
1919
# This script builds and pushes docker images when run from a release of Spark
2020
# with Kubernetes support.
2121

22-
declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \
23-
[spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \
24-
[spark-init]=kubernetes/dockerfiles/init-container/Dockerfile )
22+
function error {
23+
echo "$@" 1>&2
24+
exit 1
25+
}
26+
27+
# Detect whether this is a git clone or a Spark distribution and adjust paths
28+
# accordingly.
29+
if [ -z "${SPARK_HOME}" ]; then
30+
SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
31+
fi
32+
. "${SPARK_HOME}/bin/load-spark-env.sh"
33+
34+
if [ -f "$SPARK_HOME/RELEASE" ]; then
35+
IMG_PATH="kubernetes/dockerfiles"
36+
SPARK_JARS="jars"
37+
else
38+
IMG_PATH="resource-managers/kubernetes/docker/src/main/dockerfiles"
39+
SPARK_JARS="assembly/target/scala-$SPARK_SCALA_VERSION/jars"
40+
fi
41+
42+
if [ ! -d "$IMG_PATH" ]; then
43+
error "Cannot find docker images. This script must be run from a runnable distribution of Apache Spark."
44+
fi
45+
46+
declare -A path=( [spark-driver]="$IMG_PATH/driver/Dockerfile" \
47+
[spark-executor]="$IMG_PATH/executor/Dockerfile" \
48+
[spark-init]="$IMG_PATH/init-container/Dockerfile" )
49+
50+
function image_ref {
51+
local image="$1"
52+
local add_repo="${2:-1}"
53+
if [ $add_repo = 1 ] && [ -n "$REPO" ]; then
54+
image="$REPO/$image"
55+
fi
56+
if [ -n "$TAG" ]; then
57+
image="$image:$TAG"
58+
fi
59+
echo "$image"
60+
}
2561

2662
function build {
27-
docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .
63+
local base_image="$(image_ref spark-base 0)"
64+
docker build --build-arg "spark_jars=$SPARK_JARS" \
65+
--build-arg "img_path=$IMG_PATH" \
66+
-t "$base_image" \
67+
-f "$IMG_PATH/spark-base/Dockerfile" .
2868
for image in "${!path[@]}"; do
29-
docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
69+
docker build --build-arg "base_image=$base_image" -t "$(image_ref $image)" -f ${path[$image]} .
3070
done
3171
}
3272

33-
3473
function push {
3574
for image in "${!path[@]}"; do
36-
docker push ${REPO}/$image:${TAG}
75+
docker push "$(image_ref $image)"
3776
done
3877
}
3978

4079
function usage {
41-
echo "This script must be run from a runnable distribution of Apache Spark."
42-
echo "Usage: ./sbin/build-push-docker-images.sh -r <repo> -t <tag> build"
43-
echo " ./sbin/build-push-docker-images.sh -r <repo> -t <tag> push"
44-
echo "for example: ./sbin/build-push-docker-images.sh -r docker.io/myrepo -t v2.3.0 push"
80+
cat <<EOF
81+
Usage: $0 [options] [command]
82+
Builds or pushes the built-in Spark docker images.
83+
84+
Commands:
85+
build Build docker images.
86+
push Push images to a registry. Requires a repository address to be provided, both
87+
when building and when pushing the images.
88+
89+
Options:
90+
-r repo Repository address.
91+
-t tag Tag to apply to built images, or to identify images to be pushed.
92+
-m Use minikube environment when invoking docker.
93+
94+
Example:
95+
$0 -r docker.io/myrepo -t v2.3.0 push
96+
EOF
4597
}
4698

4799
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
48100
usage
49101
exit 0
50102
fi
51103

52-
while getopts r:t: option
104+
REPO=
105+
TAG=
106+
while getopts mr:t: option
53107
do
54108
case "${option}"
55109
in
56110
r) REPO=${OPTARG};;
57111
t) TAG=${OPTARG};;
112+
m)
113+
if ! which minikube 1>/dev/null; then
114+
error "Cannot find minikube."
115+
fi
116+
eval $(minikube docker-env)
117+
;;
58118
esac
59119
done
60120

61-
if [ -z "$REPO" ] || [ -z "$TAG" ]; then
121+
case "${@: -1}" in
122+
build)
123+
build
124+
;;
125+
push)
126+
if [ -z "$REPO" ]; then
127+
usage
128+
exit 1
129+
fi
130+
push
131+
;;
132+
*)
62133
usage
63-
else
64-
case "${@: -1}" in
65-
build) build;;
66-
push) push;;
67-
*) usage;;
68-
esac
69-
fi
134+
exit 1
135+
;;
136+
esac

0 commit comments

Comments
 (0)