Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Spark on k8s to beta #4

@foxish

Description

@foxish

This issue tracks all the items we need to do to get Spark support to beta, at which point we can start encouraging people to use it.
I'm putting together the items from the design doc. We can break these down further if there is a need.

Phase One: Static Allocation MVP (This is the phase to provide the MVP. It will have significant feature and security gaps but after completion should allow running a Spark job in k8s for a narrow use case.)

  • Spark-submit support for cluster mode
  • Static number of executors
  • Only Java + Scala support
  • Providing user code from both the client’s local disk and remote locations
  • Basic unit/integration testing
  • Documentation

== Alpha Release ==

Phase Two: Dynamic Allocation

  • Shuffle Service Finalization
  • Dynamic allocation support
  • External shuffle service prototypes, both with the sidecar approach and the daemon set approach. Assess the two implementations, and decide between them.
  • Resource staging server for hosting local files

Phase Three: Complete Core Spark Features

  • Use K8s secrets to secure external shuffle service communication
  • “Decent security” for data processed in Spark
  • Hooks in the scheduler back-end for the kube layer to request scale-up or scale-down aka “custom controller support” (do we still needs this?)
  • Shuffle data protected at rest from neighbor processes

Beyond Beta

Phase Four: Future K8s Features

  • Job Management UI (similar to YARN’s ResourceManager scheduler view)
  • Support for remaining language bindings (Python, R)
  • Integration with k8s Third Party Resources
  • Isolation (malicious jobs can’t DOS neighbor jobs)
  • Fair sharing / queueing mechanism
  • Protection against disk exhaustion
  • Protection against deadlock with all cluster resources running drivers and none running executors
  • Spark-shell / client mode

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions