-
Notifications
You must be signed in to change notification settings - Fork 117
Development workflow documentation for the current state of the world. #20
Changes from all commits
1ea0061
01e0c62
d86bcfe
f2cab3f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| --- | ||
| layout: global | ||
| title: Spark on Kubernetes Development | ||
| --- | ||
|
|
||
| [Kubernetes](https://kubernetes.io/) is a framework for easily deploying, scaling, and managing containerized | ||
| applications. It would be useful for a user to run their Spark jobs on a Kubernetes cluster alongside their | ||
| other Kubernetes-managed applications. For more about the motivations for adding this feature, see the umbrella JIRA | ||
| ticket that tracks this project: [SPARK-18278](https://issues.apache.org/jira/browse/SPARK-18278). | ||
|
|
||
| This submodule is an initial implementation of allowing Kubernetes to be a | ||
| supported cluster manager for Spark, along with Mesos, Hadoop YARN, and Standalone. This document provides a summary of | ||
| important matters to keep in mind when developing this feature. | ||
|
|
||
| # Building Spark with Kubernetes Support | ||
|
|
||
| To build Spark with Kubernetes support, use the `kubernetes` profile when invoking Maven. For example, to simply compile | ||
| the Kubernetes core implementation module along with its dependencies: | ||
|
|
||
| build/mvn compile -Pkubernetes -pl resource-managers/kubernetes/core -am | ||
|
|
||
| To build a distribution of Spark with Kubernetes support, use the `dev/make-distribution.sh` script, and add the | ||
| `kubernetes` profile as part of the build arguments. Any other build arguments can be specified as one would expect when | ||
| building Spark normally. For example, to build Spark against Hadoop 2.7 and Kubernetes: | ||
|
|
||
| dev/make-distribution.sh --tgz -Phadoop2.7 -Pkubernetes | ||
|
|
||
| # Kubernetes Code Modules | ||
|
|
||
| Below is a list of the submodules for this cluster manager and what they do. | ||
|
|
||
| * `core`: Implementation of the Kubernetes cluster manager support. | ||
| * `integration-tests`: Integration tests for the project. | ||
| * `docker-minimal-bundle`: Base Dockerfiles for the driver and the executors. The Dockerfiles are used for integration | ||
| tests as well as being provided in packaged distributions of Spark. | ||
| * `integration-tests-spark-jobs`: Spark jobs that are only used in integration tests. | ||
| * `integration-tests-spark-jobs-helpers`: Dependencies for the spark jobs used in integration tests. These dependencies | ||
| are separated out to facilitate testing the shipping of jars to drivers running on Kubernetes clusters. | ||
|
|
||
| # Running the Kubernetes Integration Tests | ||
|
|
||
| Note that the integration test framework is currently being heavily revised and is subject to change. | ||
|
|
||
| Running any of the integration tests requires including `kubernetes-integration-tests` profile in the build command. In | ||
| order to prepare the environment for running the integration tests, the `pre-integration-test` step must be run in Maven | ||
| on the `resource-managers/kubernetes/integration-tests` module: | ||
|
|
||
| build/mvn pre-integration-test -Pkubernetes -Pkubernetes-integration-tests -pl resource-managers/kubernetes/integration-tests -am | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. once this pre-integration-test profile is run, how do I run the tests themselves? just the k8s ones
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This depends so I wasn't sure how to include it. If just running from the command line, there should be just a maven command for it - this follows the general maven semantics but needing to specify each of the suites means that if we add, rename, or delete suites, we would have to adjust the docs accordingly. If running in IntelliJ, before running any test in IntelliJ, if any code changes the pre-integration-test has to be run. From there it's just using JUnit to run the test classes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe then just a message afterwards saying something like "now run your tests normally, either on the command line with maven or through an IDE like IntelliJ"
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
|
||
| Afterwards, the integration tests can be executed with Maven or your IDE. Note that when running tests from an IDE, the | ||
| `pre-integration-test` phase must be run every time the Spark main code changes. When running tests from the | ||
| command line, the `pre-integration-test` phase should automatically be invoked if the `integration-test` phase is run. | ||
|
|
||
| # Usage Guide | ||
|
|
||
| See the [usage guide](../../docs/running-on-kubernetes.md) for more information. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the bottom of the page we should link to the user docs for how to use it as a natural continuation from the dev setup docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done