Spark on k8s to beta

This issue tracks all the items we need to do to get Spark support to beta, at which point we can start encouraging people to use it.
I'm putting together the items from the [design doc](https://docs.google.com/document/d/1_bBzOZ8rKiOSjQg78DXOA3ZBIo_KkDJjqxVuq0yXdew/edit#heading=h.22iurepifhgt). We can break these down further if there is a need.

Phase One: Static Allocation MVP (This is the phase to provide the MVP.  It will have significant feature and security gaps but after completion should allow running a Spark job in k8s for a narrow use case.)
* [x] Spark-submit support for cluster mode
* [x] Static number of executors
* [x] Only Java + Scala support
* [x] Providing user code from both the client’s local disk and remote locations
* [x] Basic unit/integration testing
* [x] Documentation

== Alpha Release ==

Phase Two: Dynamic Allocation
* [x] Shuffle Service Finalization
* [x] Dynamic allocation support
* [x] External shuffle service prototypes, both with the sidecar approach and the daemon set approach. Assess the two implementations, and decide between them.
* [x] Resource staging server for hosting local files

Phase Three: Complete Core Spark Features
* [ ] Use K8s secrets to secure external shuffle service communication
* [ ] “Decent security” for data processed in Spark
* [ ] Hooks in the scheduler back-end for the kube layer to request scale-up or scale-down aka “custom controller support” (do we still needs this?)
* [ ] Shuffle data protected at rest from neighbor processes


## Beyond Beta

Phase Four: Future K8s Features

* [ ] Job Management UI (similar to YARN’s ResourceManager scheduler view)
* [x] Support for remaining language bindings (Python, R)
* [ ] Integration with k8s Third Party Resources
* [ ] Isolation (malicious jobs can’t DOS neighbor jobs)
* [ ] Fair sharing / queueing mechanism
* [ ] Protection against disk exhaustion
* [ ] Protection against deadlock with all cluster resources running drivers and none running executors
* [ ] Spark-shell / client mode


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark on k8s to beta #4

Beyond Beta

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spark on k8s to beta #4

Description

Beyond Beta

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions