diff --git a/resource-managers/kubernetes/README.md b/resource-managers/kubernetes/README.md new file mode 100644 index 0000000000000..3c11efa38d5af --- /dev/null +++ b/resource-managers/kubernetes/README.md @@ -0,0 +1,56 @@ +--- +layout: global +title: Spark on Kubernetes Development +--- + +[Kubernetes](https://kubernetes.io/) is a framework for easily deploying, scaling, and managing containerized +applications. It would be useful for a user to run their Spark jobs on a Kubernetes cluster alongside their +other Kubernetes-managed applications. For more about the motivations for adding this feature, see the umbrella JIRA +ticket that tracks this project: [SPARK-18278](https://issues.apache.org/jira/browse/SPARK-18278). + +This submodule is an initial implementation of allowing Kubernetes to be a +supported cluster manager for Spark, along with Mesos, Hadoop YARN, and Standalone. This document provides a summary of +important matters to keep in mind when developing this feature. + +# Building Spark with Kubernetes Support + +To build Spark with Kubernetes support, use the `kubernetes` profile when invoking Maven. For example, to simply compile +the Kubernetes core implementation module along with its dependencies: + + build/mvn compile -Pkubernetes -pl resource-managers/kubernetes/core -am + +To build a distribution of Spark with Kubernetes support, use the `dev/make-distribution.sh` script, and add the +`kubernetes` profile as part of the build arguments. Any other build arguments can be specified as one would expect when +building Spark normally. For example, to build Spark against Hadoop 2.7 and Kubernetes: + + dev/make-distribution.sh --tgz -Phadoop2.7 -Pkubernetes + +# Kubernetes Code Modules + +Below is a list of the submodules for this cluster manager and what they do. + +* `core`: Implementation of the Kubernetes cluster manager support. +* `integration-tests`: Integration tests for the project. +* `docker-minimal-bundle`: Base Dockerfiles for the driver and the executors. The Dockerfiles are used for integration + tests as well as being provided in packaged distributions of Spark. +* `integration-tests-spark-jobs`: Spark jobs that are only used in integration tests. +* `integration-tests-spark-jobs-helpers`: Dependencies for the spark jobs used in integration tests. These dependencies + are separated out to facilitate testing the shipping of jars to drivers running on Kubernetes clusters. + +# Running the Kubernetes Integration Tests + +Note that the integration test framework is currently being heavily revised and is subject to change. + +Running any of the integration tests requires including `kubernetes-integration-tests` profile in the build command. In +order to prepare the environment for running the integration tests, the `pre-integration-test` step must be run in Maven +on the `resource-managers/kubernetes/integration-tests` module: + + build/mvn pre-integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am + +Afterwards, the integration tests can be executed with Maven or your IDE. Note that when running tests from an IDE, the +`pre-integration-test` phase must be run every time the Spark main code changes. When running tests from the +command line, the `pre-integration-test` phase should automatically be invoked if the `integration-test` phase is run. + +# Usage Guide + +See the [usage guide](../../docs/running-on-kubernetes.md) for more information.