Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Jan 9, 2018

This change allows a user to submit a Spark application on kubernetes
having to provide a single image, instead of one image for each type
of container. The image's entry point now takes an extra argument that
identifies the process that is being started.

The configuration still allows the user to provide different images
for each container type if they so desire.

On top of that, the entry point was simplified a bit to share more
code; mainly, the same env variable is used to propagate the user-defined
classpath to the different containers.

Aside from being modified to match the new behavior, the
'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh'
to more closely match its purpose; the old name was a little awkward
and now also not entirely correct, since there is a single image. It
was also moved to 'bin' since it's not necessarily an admin tool.

Docs have been updated to match the new behavior.

Tested locally with minikube.

This change allows a user to submit a Spark application on kubernetes
having to provide a single image, instead of one image for each type
of container. The image's entry point now takes an extra argument that
identifies the process that is being started.

The configuration still allows the user to provide different images
for each container type if they so desire.

On top of that, the entry point was simplified a bit to share more
code; mainly, the same env variable is used to propagate the user-defined
classpath to the different containers.

Aside from being modified to match the new behavior, the
'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh'
to more closely match its purpose; the old name was a little awkward
and now also not entirely correct, since there is a single image. It
was also moved to 'bin' since it's not necessarily an admin tool.

Docs and scripts have been updated to match the new behavior.
@mccheah
Copy link
Contributor

mccheah commented Jan 9, 2018

+1 - users with custom docker images can override the classpath by putting different contents in the jars directory or by using a custom SPARK_CLASSPATH environment variable in the child image.

@SparkQA
Copy link

SparkQA commented Jan 9, 2018

Test build #85814 has finished for PR 20192 at commit bfad831.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, mostly minor comments

COPY conf /opt/spark/conf
COPY ${img_path}/spark-base/entrypoint.sh /opt/
COPY ${img_path}/spark/entrypoint.sh /opt/
COPY examples /opt/spark/examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of examples depends on data/, should that be included too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about that directory, but sounds like it should be added.


SPARK_K8S_CMD="$1"
if [ -z "$SPARK_K8S_CMD" ]; then
echo "No command to execute has been provided." 1>&2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this starting container without a command, could be very useful for in-cluster client, or just ad hoc testing outside of k8s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do that with docker container create --entrypoint blah, right? Otherwise you have to add code here to specify what command to run when no arguments are provided. I'd rather have a proper error, since the entry point is tightly coupled with the submission code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can revisit when we have proper client support.
overriding the entrypoint won't do if I want everything else set (eg. SPARK_CLASSPATH)

env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \
readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different classpath addition for different role has been taken out?
SPARK_EXECUTOR_EXTRA_CLASSPATH
SPARK_SUBMIT_EXTRA_CLASSPATH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is handled in the submission code; SPARK_CLASSPATH is set to the appropriate value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify, by that I mean we no longer have the ability to customize different classpath for executor and driver.
for reference, see spark.driver.extraClassPath vs spark.executor.extraClassPath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you do.

The submission code sets SPARK_CLASSPATH to spark.driver.extraClassPath in the driver case, and to spark.executor.extraClassPath in the executor case.

ConfigBuilder("spark.kubernetes.container.image")
.doc("Container image to use for Spark containers. Individual container types " +
"(e.g. driver or executor) can also be configured to use different images if desired, " +
"by setting the container-specific image name.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: container-specific => container-type-specific
and add " eg. spark.kubernetes.driver.container.image"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would I mention just one specific way of overriding this?

I also have half a mind to just remove this since this documentation is not visible anywhere...


init)
CMD=(
"/opt/spark/bin/spark-class"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be under SPARK_HOME?

@foxish
Copy link
Contributor

foxish commented Jan 9, 2018

Our integration tests should be changed to accommodate this modification and test it, and we should also add some new tests utilizing the newly added option.

Copy link
Contributor

@liyinan926 liyinan926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with a few minor comments.

when building and when pushing the images.
build Build image.
push Push a pre-built image to a registry. Requires a repository address to be provided,
both when building and when pushing the image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to state explicitly for build and push individually.

.set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
.set(INIT_CONTAINER_IMAGE, IC_IMAGE)
.set(CONTAINER_IMAGE, DRIVER_IMAGE)
.set(INIT_CONTAINER_IMAGE.key, IC_IMAGE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need to set this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test is checking different values for the default and init container images.

for image in "${!path[@]}"; do
docker build -t "$(image_ref $image)" -f ${path[$image]} .
done
# Detect whether this is a git clone or a Spark distribution and adjust paths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"paths" => "values of the following variables?.


Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be called out that a single Dockerfile is shipped with Spark.

Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized
and built for your usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it worths having a dedicated sub-section here on how to use custom images, e.g., named Customizing Container Images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate change. I don't even know what you'd write there. The whole "custom image" thing needs to be properly specified first - what exactly is the contract between the submission code and the images, for example.

Copy link
Contributor

@liyinan926 liyinan926 Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we don't have a solid story around customizing images here. But I do think that we need something clearly telling people that we do support using custom images if they want to and the properties they should use to configure custom images, and better with a submission command example. It just doesn't need to be opinionated on things like the contact you mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, that sounds like something that should be added in a separate change. I'm not changing the customizability of images in this change.

And not having a contract means people will have no idea of how to customize images, so you can't even write proper documentation for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am fine with adding this in a future change.

@vanzin
Copy link
Contributor Author

vanzin commented Jan 9, 2018

users with custom docker images can override the classpath by

I wrote this in a comment above, but there needs to be a proper definition of how to customize these docker images. There needs to be a contract between the submission code, the entry point, and how stuff is laid out inside the image, and I don't see that specified anywhere.

However that's done, I also would suggest that env variables be avoided.

@foxish
Copy link
Contributor

foxish commented Jan 9, 2018

@vanzin, do you have some time to modify the integration tests as well? The change LGTM, but a clean run on minikube would give us a lot more confidence. Until the integration tests get checked in to this repo and running in PRB (@ssuchter is working on this), we think that the best way to keep them in sync is to ensure that PRs get a manual clean run against the suite.

@liyinan926
Copy link
Contributor

LGTM.

@vanzin
Copy link
Contributor Author

vanzin commented Jan 9, 2018

do you have some time to modify the integration tests as well

I can try to look, but really you guys should be putting that code into the Spark repo. I don't see a task under SPARK-18278 for adding the integration tests.

@foxish
Copy link
Contributor

foxish commented Jan 9, 2018

Thanks @vanzin. I was waiting on spark-dev thread on integration testing to conclude. It does look like checking the tests in is something we should do - adding a task tracking it. We're also stabilizing the testing atm - so, I'm thinking we'll target that for post-2.3. Would be great to get an architecture review from the Spark community on it, as it exists today, to get some feedback going.

@vanzin
Copy link
Contributor Author

vanzin commented Jan 9, 2018

$ mvn clean integration-test -Dspark-distro-tgz=/work/apache/spark/spark-2.3.0-SNAPSHOT-bin-2.7.3.tgz -DextraScalaTestArgs="-Dspark.kubernetes.test.master=k8s://https://192.168.99.100:8443 -Dspark.docker.test.skipBuildImages=true -Dspark.docker.test.driverImage=spark:master -Dspark.docker.test.executorImage=spark:master -Dspark.docker.test.initContainerImage=spark:master -Dspark.docker.test.persistMinikube=true"
...
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 101 milliseconds.
Run starting. Expected test count is: 9
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi using the remote example jar.
- Run SparkPi with custom driver pod name, labels, annotations, and environment variables.
- Run SparkPi with a test secret mounted into the driver and executor pods
- Run SparkPi using the remote example jar with a test secret mounted into the driver and executor pods
- Run PageRank using remote data file
Run completed in 3 minutes, 18 seconds.

@foxish
Copy link
Contributor

foxish commented Jan 9, 2018

Great, thanks @vanzin. We'll probably need to add a test case using the new option as well - I can take care of that.
Thanks for the change.

@SparkQA
Copy link

SparkQA commented Jan 9, 2018

Test build #85870 has finished for PR 20192 at commit e771ed9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Jan 10, 2018

If there's no more feedback I'll merge this later today.

@mccheah
Copy link
Contributor

mccheah commented Jan 11, 2018

Good to merge here?

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vanzin
Copy link
Contributor Author

vanzin commented Jan 11, 2018

Merging to master / 2.3.

asfgit pushed a commit that referenced this pull request Jan 11, 2018
This change allows a user to submit a Spark application on kubernetes
having to provide a single image, instead of one image for each type
of container. The image's entry point now takes an extra argument that
identifies the process that is being started.

The configuration still allows the user to provide different images
for each container type if they so desire.

On top of that, the entry point was simplified a bit to share more
code; mainly, the same env variable is used to propagate the user-defined
classpath to the different containers.

Aside from being modified to match the new behavior, the
'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh'
to more closely match its purpose; the old name was a little awkward
and now also not entirely correct, since there is a single image. It
was also moved to 'bin' since it's not necessarily an admin tool.

Docs have been updated to match the new behavior.

Tested locally with minikube.

Author: Marcelo Vanzin <[email protected]>

Closes #20192 from vanzin/SPARK-22994.

(cherry picked from commit 0b2eefb)
Signed-off-by: Marcelo Vanzin <[email protected]>
@asfgit asfgit closed this in 0b2eefb Jan 11, 2018
@vanzin vanzin deleted the SPARK-22994 branch January 16, 2018 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants