Migrate temporary storage for Spark jobs to EmptyDir

Currently, Spark jobs use dirs inside the driver and executor pods for storing temporary files. For instance, the work dirs for the Spark driver and executors use dirs inside the pods. And internal shuffle service per executor also uses in-pod dirs.

These in-pod dirs are within the [docker storage backend](https://docs.docker.com/engine/userguide/storagedriver/selectadriver/), which can be slow due to its [copy-on-write](https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/#the-copy-on-write-cow-strategy) overhead. Many of the storage backends implement block level CoW. Each small write will incur copy of the entire block. The overhead can become very high if the files are updated by many small writes. It is recommended to avoid using docker storage backend for such use cases. From the first link above:
> Ideally, very little data is written to a container’s writable layer, and you use Docker volumes to write data.

We should use `EmptyDir` for temporary storage to avoid this overhead. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate temporary storage for Spark jobs to EmptyDir #439

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrate temporary storage for Spark jobs to EmptyDir #439

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions