From 1ea00618f322651e186e84ce80d9b0c206b8d733 Mon Sep 17 00:00:00 2001 From: mcheah Date: Fri, 13 Jan 2017 14:01:16 -0800 Subject: [PATCH 1/3] Development workflow documentation for the current state of the world. --- resource-managers/kubernetes/README.md | 45 ++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 resource-managers/kubernetes/README.md diff --git a/resource-managers/kubernetes/README.md b/resource-managers/kubernetes/README.md new file mode 100644 index 0000000000000..c891058572963 --- /dev/null +++ b/resource-managers/kubernetes/README.md @@ -0,0 +1,45 @@ +--- +layout: global +title: Spark on Kubernetes Development +--- + +[Kubernetes](https://kubernetes.io/) is a framework for easily deploying, scaling, and managing containerized +applications. It would be useful for a user to run their Spark jobs on a Kubernetes cluster alongside their +other Kubernetes-managed applications. This submodule is an initial implementation of allowing Kubernetes to be a +supported cluster manager for Spark, along with Mesos, Hadoop YARN, and Standalone. This document provides a summary of +important matters to keep in mind when developing this feature. + +# Building Spark with Kubernetes Support + +To build Spark with Kubernetes support, use the `kubernetes` profile when invoking Maven. For example, to simply compile +the Kubernetes core implementation module along with its dependencies: + + build/mvn compile -Pkubernetes -pl resource-managers/kubernetes/core -am + +To build a distribution of Spark with Kubernetes support, use the `dev/make-distribution.sh` script, and add the +`kubernetes` profile as part of the build arguments. Any other build arguments can be specified as one would expect when +building Spark normally. For example, to build Spark against Hadoop 2.7 and Kubernetes: + + dev/make-distribution.sh --tgz -Phadoop2.7 -Pkubernetes + +# Kubernetes Code Modules + +Below is a list of the submodules for this cluster manager and what they do. + +* `core`: Implementation of the Kubernetes cluster manager support. +* `integration-tests`: Integration tests for the project. +* `docker-minimal-bundle`: Base Dockerfiles for the driver and the executors. The Dockerfiles are used for integration + tests as well as being provided in packaged distributions of Spark. +* `integration-tests-spark-jobs`: Spark jobs that are only used in integration tests. +* `integration-tests-spark-jobs-helpers`: Dependencies for the spark jobs used in integration tests. These dependencies + are separated out to facilitate testing the shipping of jars to drivers running on Kubernetes clusters. + +# Running the Kubernetes Integration Tests + +Note that the integration test framework is currently being heavily revised and is subject to change. + +Running any of the integration tests requires including `kubernetes-integration-tests` profile in the build command. In +order to prepare the environment for running the integration tests, the `pre-integration-test` step must be run in Maven +on the `resource-managers/kubernetes/integration-tests` module: + + build/mvn pre-integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am From d86bcfe96eca4313801cc92a17583f9cfefdac52 Mon Sep 17 00:00:00 2001 From: mcheah Date: Fri, 13 Jan 2017 14:40:51 -0800 Subject: [PATCH 2/3] Address comments. --- resource-managers/kubernetes/README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/resource-managers/kubernetes/README.md b/resource-managers/kubernetes/README.md index c891058572963..78b3328981126 100644 --- a/resource-managers/kubernetes/README.md +++ b/resource-managers/kubernetes/README.md @@ -43,3 +43,11 @@ order to prepare the environment for running the integration tests, the `pre-int on the `resource-managers/kubernetes/integration-tests` module: build/mvn pre-integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am + +Afterwards, the integration tests can be executed with Maven or your IDE. Note that when running tests from an IDE, the +`pre-integration-test` phase must be run every time the core Kubernetes code changes. When running tests from the +command line, the `pre-integration-test` phase should automatically be invoked if the `integration-test` phase is run. + +# Usage Guide + +See the [usage guide](../../docs/running-on-kubernetes.md) for more information. From f2cab3f7c8b096c9e77906eb8547f8248f9e2d0a Mon Sep 17 00:00:00 2001 From: mcheah Date: Fri, 13 Jan 2017 14:54:06 -0800 Subject: [PATCH 3/3] Clarified code change and added ticket link --- resource-managers/kubernetes/README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/resource-managers/kubernetes/README.md b/resource-managers/kubernetes/README.md index 78b3328981126..3c11efa38d5af 100644 --- a/resource-managers/kubernetes/README.md +++ b/resource-managers/kubernetes/README.md @@ -5,7 +5,10 @@ title: Spark on Kubernetes Development [Kubernetes](https://kubernetes.io/) is a framework for easily deploying, scaling, and managing containerized applications. It would be useful for a user to run their Spark jobs on a Kubernetes cluster alongside their -other Kubernetes-managed applications. This submodule is an initial implementation of allowing Kubernetes to be a +other Kubernetes-managed applications. For more about the motivations for adding this feature, see the umbrella JIRA +ticket that tracks this project: [SPARK-18278](https://issues.apache.org/jira/browse/SPARK-18278). + +This submodule is an initial implementation of allowing Kubernetes to be a supported cluster manager for Spark, along with Mesos, Hadoop YARN, and Standalone. This document provides a summary of important matters to keep in mind when developing this feature. @@ -45,7 +48,7 @@ on the `resource-managers/kubernetes/integration-tests` module: build/mvn pre-integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am Afterwards, the integration tests can be executed with Maven or your IDE. Note that when running tests from an IDE, the -`pre-integration-test` phase must be run every time the core Kubernetes code changes. When running tests from the +`pre-integration-test` phase must be run every time the Spark main code changes. When running tests from the command line, the `pre-integration-test` phase should automatically be invoked if the `integration-test` phase is run. # Usage Guide