From 677b10cabc18e94c656ef4bf11beec099b863d62 Mon Sep 17 00:00:00 2001 From: Andrew Ash Date: Wed, 25 Jan 2017 16:57:14 -0800 Subject: [PATCH 1/3] Create README to better describe project purpose --- README.md | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/README.md b/README.md index d0eca1ddea283..e0d494fbe4647 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,35 @@ +# Apache Spark On Kubernetes + +This repository, located at https://github.com/apache-spark-on-k8s/spark, contains a fork of Apache Spark that enables running Spark jobs natively on a Kubernetes cluster. + +## What is this? + +This is a collaboratively maintained project working on [SPARK-18278](https://issues.apache.org/jira/browse/SPARK-18278). The goal is to bring native support for Spark to use Kubernetes as a cluster manager, in a fully supported way on par with the Spark Standalone, Mesos, and Apache YARN cluster managers. + +## Why does this fork exist? + +Adding native integration for a new cluster manager is a large undertaking. If poorly executed, it could introduce bugs into Spark when run on other cluster managers, cause release blockers slowing down the overall Spark project, or require hotfixes which divert attention away from development towards managing additional releases. Any work this core in Spark needs to be done carefully to minimize the risk of those negative externalities. + +At the same time, an increasing number of people from varying companies and organizations desire to work together to natively run Spark on Kubernetes. The group needs a code repository, communication forum, issue tracking, and continuous integration, all in order to work together effectively on an open source product. + +We've been asked by an Apache Spark Committer to work outside of the Apache infrastructure for a short period of time to allow this feature to be hardened and improved without creating risk for Apache Spark. The aim is to rapidly bring it to the point where it can be brought into the mainline Apache Spark repository for continued development within the Apache umbrella. If all goes well, this should be a short-lived fork rather than a long-lived one. + +## Who are we? + +This is a collaborative effort by several folks from different companies who are interested in seeing this feature be successful. Companies active in this project include (alphabetically): + +- Google +- Haiwen +- Hyperpilot +- Intel +- Palantir +- Pepperdata +- Red Hat + +-------------------- + +(original README below) + # Apache Spark Spark is a fast and general cluster computing system for Big Data. It provides From fb2900160e86427d0851cafd67eca6824e1a8d0a Mon Sep 17 00:00:00 2001 From: Andrew Ash Date: Wed, 25 Jan 2017 17:06:13 -0800 Subject: [PATCH 2/3] Add links to usage guide and dev docs --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index e0d494fbe4647..fce02d5d70f68 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,12 @@ This repository, located at https://github.com/apache-spark-on-k8s/spark, contai This is a collaboratively maintained project working on [SPARK-18278](https://issues.apache.org/jira/browse/SPARK-18278). The goal is to bring native support for Spark to use Kubernetes as a cluster manager, in a fully supported way on par with the Spark Standalone, Mesos, and Apache YARN cluster managers. +## Getting Started + +- [Usage guide](docs/running-on-kubernetes.md) shows how to run the code +- [Development docs](resource-managers/kubernetes/README.md) shows how to get set up for development +- Code is primarily located in the [resource-managers/kubernetes](resource-managers/kubernetes) module + ## Why does this fork exist? Adding native integration for a new cluster manager is a large undertaking. If poorly executed, it could introduce bugs into Spark when run on other cluster managers, cause release blockers slowing down the overall Spark project, or require hotfixes which divert attention away from development towards managing additional releases. Any work this core in Spark needs to be done carefully to minimize the risk of those negative externalities. From 005194a7b78b084d621814c5037a7eee5b009e4b Mon Sep 17 00:00:00 2001 From: Andrew Ash Date: Fri, 27 Jan 2017 16:08:25 -0800 Subject: [PATCH 3/3] Minor changes --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index fce02d5d70f68..94af9821b2cef 100644 --- a/README.md +++ b/README.md @@ -10,13 +10,13 @@ This is a collaboratively maintained project working on [SPARK-18278](https://is - [Usage guide](docs/running-on-kubernetes.md) shows how to run the code - [Development docs](resource-managers/kubernetes/README.md) shows how to get set up for development -- Code is primarily located in the [resource-managers/kubernetes](resource-managers/kubernetes) module +- Code is primarily located in the [resource-managers/kubernetes](resource-managers/kubernetes) folder ## Why does this fork exist? -Adding native integration for a new cluster manager is a large undertaking. If poorly executed, it could introduce bugs into Spark when run on other cluster managers, cause release blockers slowing down the overall Spark project, or require hotfixes which divert attention away from development towards managing additional releases. Any work this core in Spark needs to be done carefully to minimize the risk of those negative externalities. +Adding native integration for a new cluster manager is a large undertaking. If poorly executed, it could introduce bugs into Spark when run on other cluster managers, cause release blockers slowing down the overall Spark project, or require hotfixes which divert attention away from development towards managing additional releases. Any work this deep inside Spark needs to be done carefully to minimize the risk of those negative externalities. -At the same time, an increasing number of people from varying companies and organizations desire to work together to natively run Spark on Kubernetes. The group needs a code repository, communication forum, issue tracking, and continuous integration, all in order to work together effectively on an open source product. +At the same time, an increasing number of people from various companies and organizations desire to work together to natively run Spark on Kubernetes. The group needs a code repository, communication forum, issue tracking, and continuous integration, all in order to work together effectively on an open source product. We've been asked by an Apache Spark Committer to work outside of the Apache infrastructure for a short period of time to allow this feature to be hardened and improved without creating risk for Apache Spark. The aim is to rapidly bring it to the point where it can be brought into the mainline Apache Spark repository for continued development within the Apache umbrella. If all goes well, this should be a short-lived fork rather than a long-lived one.