[SPARK-22994][k8s] Use a single image for all Spark containers. #20192

vanzin · 2018-01-09T00:03:32Z

This change allows a user to submit a Spark application on kubernetes
having to provide a single image, instead of one image for each type
of container. The image's entry point now takes an extra argument that
identifies the process that is being started.

The configuration still allows the user to provide different images
for each container type if they so desire.

On top of that, the entry point was simplified a bit to share more
code; mainly, the same env variable is used to propagate the user-defined
classpath to the different containers.

Aside from being modified to match the new behavior, the
'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh'
to more closely match its purpose; the old name was a little awkward
and now also not entirely correct, since there is a single image. It
was also moved to 'bin' since it's not necessarily an admin tool.

Docs have been updated to match the new behavior.

Tested locally with minikube.

This change allows a user to submit a Spark application on kubernetes having to provide a single image, instead of one image for each type of container. The image's entry point now takes an extra argument that identifies the process that is being started. The configuration still allows the user to provide different images for each container type if they so desire. On top of that, the entry point was simplified a bit to share more code; mainly, the same env variable is used to propagate the user-defined classpath to the different containers. Aside from being modified to match the new behavior, the 'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh' to more closely match its purpose; the old name was a little awkward and now also not entirely correct, since there is a single image. It was also moved to 'bin' since it's not necessarily an admin tool. Docs and scripts have been updated to match the new behavior.

mccheah · 2018-01-09T02:14:28Z

+1 - users with custom docker images can override the classpath by putting different contents in the jars directory or by using a custom SPARK_CLASSPATH environment variable in the child image.

SparkQA · 2018-01-09T03:33:04Z

Test build #85814 has finished for PR 20192 at commit bfad831.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung

LG, mostly minor comments

felixcheung · 2018-01-09T04:16:02Z

resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile

 COPY conf /opt/spark/conf
-COPY ${img_path}/spark-base/entrypoint.sh /opt/
+COPY ${img_path}/spark/entrypoint.sh /opt/
+COPY examples /opt/spark/examples


a lot of examples depends on data/, should that be included too?

Didn't know about that directory, but sounds like it should be added.

felixcheung · 2018-01-09T04:18:16Z

resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh

+
+SPARK_K8S_CMD="$1"
+if [ -z "$SPARK_K8S_CMD" ]; then
+  echo "No command to execute has been provided." 1>&2


this starting container without a command, could be very useful for in-cluster client, or just ad hoc testing outside of k8s

You can do that with docker container create --entrypoint blah, right? Otherwise you have to add code here to specify what command to run when no arguments are provided. I'd rather have a proper error, since the entry point is tightly coupled with the submission code.

we can revisit when we have proper client support.
overriding the entrypoint won't do if I want everything else set (eg. SPARK_CLASSPATH)

felixcheung · 2018-01-09T04:22:12Z

resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile

-    env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \
-    readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
-    if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
-    if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \


different classpath addition for different role has been taken out?
SPARK_EXECUTOR_EXTRA_CLASSPATH
SPARK_SUBMIT_EXTRA_CLASSPATH

The difference is handled in the submission code; SPARK_CLASSPATH is set to the appropriate value.

to clarify, by that I mean we no longer have the ability to customize different classpath for executor and driver.
for reference, see spark.driver.extraClassPath vs spark.executor.extraClassPath

Yes you do.

The submission code sets SPARK_CLASSPATH to spark.driver.extraClassPath in the driver case, and to spark.executor.extraClassPath in the executor case.

felixcheung · 2018-01-09T04:24:31Z

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala

+    ConfigBuilder("spark.kubernetes.container.image")
+      .doc("Container image to use for Spark containers. Individual container types " +
+        "(e.g. driver or executor) can also be configured to use different images if desired, " +
+        "by setting the container-specific image name.")


nit: container-specific => container-type-specific
and add " eg. spark.kubernetes.driver.container.image"?

Why would I mention just one specific way of overriding this?

I also have half a mind to just remove this since this documentation is not visible anywhere...

felixcheung · 2018-01-09T04:27:27Z

resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh

+
+  init)
+    CMD=(
+      "/opt/spark/bin/spark-class"


shouldn't this be under SPARK_HOME?

foxish · 2018-01-09T05:22:09Z

Our integration tests should be changed to accommodate this modification and test it, and we should also add some new tests utilizing the newly added option.

liyinan926

Overall LGTM with a few minor comments.

liyinan926 · 2018-01-09T05:43:17Z

bin/docker-image-tool.sh

-              when building and when pushing the images.
+  build       Build image.
+  push        Push a pre-built image to a registry. Requires a repository address to be provided,
+              both when building and when pushing the image.


It's better to state explicitly for build and push individually.

liyinan926 · 2018-01-09T05:48:04Z

...s/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala

-      .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
-      .set(INIT_CONTAINER_IMAGE, IC_IMAGE)
+      .set(CONTAINER_IMAGE, DRIVER_IMAGE)
+      .set(INIT_CONTAINER_IMAGE.key, IC_IMAGE)


Do you still need to set this?

Yes, the test is checking different values for the default and init container images.

liyinan926 · 2018-01-09T05:54:28Z

bin/docker-image-tool.sh

-  for image in "${!path[@]}"; do
-    docker build -t "$(image_ref $image)" -f ${path[$image]} .
-  done
+  # Detect whether this is a git clone or a Spark distribution and adjust paths


"paths" => "values of the following variables?.

liyinan926 · 2018-01-09T05:57:03Z

docs/running-on-kubernetes.md


 Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
 be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
 frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized


It should be called out that a single Dockerfile is shipped with Spark.

liyinan926 · 2018-01-09T05:58:02Z

docs/running-on-kubernetes.md

 Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
 be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
 frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized
 and built for your usage.


I think it worths having a dedicated sub-section here on how to use custom images, e.g., named Customizing Container Images.

Separate change. I don't even know what you'd write there. The whole "custom image" thing needs to be properly specified first - what exactly is the contract between the submission code and the images, for example.

I agree that we don't have a solid story around customizing images here. But I do think that we need something clearly telling people that we do support using custom images if they want to and the properties they should use to configure custom images, and better with a submission command example. It just doesn't need to be opinionated on things like the contact you mentioned.

Still, that sounds like something that should be added in a separate change. I'm not changing the customizability of images in this change.

And not having a contract means people will have no idea of how to customize images, so you can't even write proper documentation for that.

OK, I am fine with adding this in a future change.

vanzin · 2018-01-09T19:44:59Z

users with custom docker images can override the classpath by

I wrote this in a comment above, but there needs to be a proper definition of how to customize these docker images. There needs to be a contract between the submission code, the entry point, and how stuff is laid out inside the image, and I don't see that specified anywhere.

However that's done, I also would suggest that env variables be avoided.

foxish · 2018-01-09T20:02:24Z

@vanzin, do you have some time to modify the integration tests as well? The change LGTM, but a clean run on minikube would give us a lot more confidence. Until the integration tests get checked in to this repo and running in PRB (@ssuchter is working on this), we think that the best way to keep them in sync is to ensure that PRs get a manual clean run against the suite.

liyinan926 · 2018-01-09T20:12:38Z

LGTM.

vanzin · 2018-01-09T20:33:42Z

do you have some time to modify the integration tests as well

I can try to look, but really you guys should be putting that code into the Spark repo. I don't see a task under SPARK-18278 for adding the integration tests.

foxish · 2018-01-09T20:44:55Z

Thanks @vanzin. I was waiting on spark-dev thread on integration testing to conclude. It does look like checking the tests in is something we should do - adding a task tracking it. We're also stabilizing the testing atm - so, I'm thinking we'll target that for post-2.3. Would be great to get an architecture review from the Spark community on it, as it exists today, to get some feedback going.

vanzin · 2018-01-09T21:45:40Z

$ mvn clean integration-test -Dspark-distro-tgz=/work/apache/spark/spark-2.3.0-SNAPSHOT-bin-2.7.3.tgz -DextraScalaTestArgs="-Dspark.kubernetes.test.master=k8s://https://192.168.99.100:8443 -Dspark.docker.test.skipBuildImages=true -Dspark.docker.test.driverImage=spark:master -Dspark.docker.test.executorImage=spark:master -Dspark.docker.test.initContainerImage=spark:master -Dspark.docker.test.persistMinikube=true"
...
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 101 milliseconds.
Run starting. Expected test count is: 9
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi using the remote example jar.
- Run SparkPi with custom driver pod name, labels, annotations, and environment variables.
- Run SparkPi with a test secret mounted into the driver and executor pods
- Run SparkPi using the remote example jar with a test secret mounted into the driver and executor pods
- Run PageRank using remote data file
Run completed in 3 minutes, 18 seconds.

foxish · 2018-01-09T22:09:08Z

Great, thanks @vanzin. We'll probably need to add a test case using the new option as well - I can take care of that.
Thanks for the change.

SparkQA · 2018-01-09T23:41:54Z

Test build #85870 has finished for PR 20192 at commit e771ed9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2018-01-10T20:57:19Z

If there's no more feedback I'll merge this later today.

mccheah · 2018-01-11T01:18:27Z

Good to merge here?

felixcheung

LGTM

vanzin · 2018-01-11T18:37:31Z

Merging to master / 2.3.

This change allows a user to submit a Spark application on kubernetes having to provide a single image, instead of one image for each type of container. The image's entry point now takes an extra argument that identifies the process that is being started. The configuration still allows the user to provide different images for each container type if they so desire. On top of that, the entry point was simplified a bit to share more code; mainly, the same env variable is used to propagate the user-defined classpath to the different containers. Aside from being modified to match the new behavior, the 'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh' to more closely match its purpose; the old name was a little awkward and now also not entirely correct, since there is a single image. It was also moved to 'bin' since it's not necessarily an admin tool. Docs have been updated to match the new behavior. Tested locally with minikube. Author: Marcelo Vanzin <[email protected]> Closes #20192 from vanzin/SPARK-22994. (cherry picked from commit 0b2eefb) Signed-off-by: Marcelo Vanzin <[email protected]>

felixcheung approved these changes Jan 9, 2018

View reviewed changes

liyinan926 mentioned this pull request Jan 9, 2018

[SPARK-22998][K8S] Set missing value for SPARK_MOUNTED_CLASSPATH in the executors #20193

Closed

liyinan926 reviewed Jan 9, 2018

View reviewed changes

Feedback.

e771ed9

felixcheung approved these changes Jan 11, 2018

View reviewed changes

asfgit closed this in 0b2eefb Jan 11, 2018

mccheah mentioned this pull request Jan 12, 2018

Move docker image management and test entrypoint to Maven apache-spark-on-k8s/spark-integration#31

Merged

vanzin deleted the SPARK-22994 branch January 16, 2018 19:49

This was referenced Jan 22, 2018

Update to support using the default single Spark image kubeflow/spark-operator#42

Closed

Updated to support the single image kubeflow/spark-operator#43

Merged

[SPARK-22994][k8s] Use a single image for all Spark containers. #20192

[SPARK-22994][k8s] Use a single image for all Spark containers. #20192

Uh oh!

Conversation

vanzin commented Jan 9, 2018

Uh oh!

mccheah commented Jan 9, 2018

Uh oh!

SparkQA commented Jan 9, 2018

Uh oh!

felixcheung left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

foxish commented Jan 9, 2018

Uh oh!

liyinan926 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liyinan926 Jan 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin commented Jan 9, 2018

Uh oh!

foxish commented Jan 9, 2018

Uh oh!

liyinan926 commented Jan 9, 2018

Uh oh!

vanzin commented Jan 9, 2018

Uh oh!

foxish commented Jan 9, 2018

Uh oh!

vanzin commented Jan 9, 2018

Uh oh!

foxish commented Jan 9, 2018

Uh oh!

SparkQA commented Jan 9, 2018

Uh oh!

liyinan926 Jan 9, 2018 •

edited

Loading