This repository was archived by the owner on Jan 9, 2020. It is now read-only.
forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 117
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
Spark on k8s to beta #4
Copy link
Copy link
Closed
Description
This issue tracks all the items we need to do to get Spark support to beta, at which point we can start encouraging people to use it.
I'm putting together the items from the design doc. We can break these down further if there is a need.
Phase One: Static Allocation MVP (This is the phase to provide the MVP. It will have significant feature and security gaps but after completion should allow running a Spark job in k8s for a narrow use case.)
- Spark-submit support for cluster mode
- Static number of executors
- Only Java + Scala support
- Providing user code from both the client’s local disk and remote locations
- Basic unit/integration testing
- Documentation
== Alpha Release ==
Phase Two: Dynamic Allocation
- Shuffle Service Finalization
- Dynamic allocation support
- External shuffle service prototypes, both with the sidecar approach and the daemon set approach. Assess the two implementations, and decide between them.
- Resource staging server for hosting local files
Phase Three: Complete Core Spark Features
- Use K8s secrets to secure external shuffle service communication
- “Decent security” for data processed in Spark
- Hooks in the scheduler back-end for the kube layer to request scale-up or scale-down aka “custom controller support” (do we still needs this?)
- Shuffle data protected at rest from neighbor processes
Beyond Beta
Phase Four: Future K8s Features
- Job Management UI (similar to YARN’s ResourceManager scheduler view)
- Support for remaining language bindings (Python, R)
- Integration with k8s Third Party Resources
- Isolation (malicious jobs can’t DOS neighbor jobs)
- Fair sharing / queueing mechanism
- Protection against disk exhaustion
- Protection against deadlock with all cluster resources running drivers and none running executors
- Spark-shell / client mode
erictune, bzz, jerryjung, rootsongjc and IceMimosaerictune
Metadata
Metadata
Labels
No labels