-
Notifications
You must be signed in to change notification settings - Fork 117
Mount emptyDir volumes for temporary directories on executors in static allocation mode (rebased) #522
Mount emptyDir volumes for temporary directories on executors in static allocation mode (rebased) #522
Conversation
|
I created this PR so that I didn't have to overwrite the original branch from #486. The rebase was pretty tricky so I want to keep the old history around just in case. |
|
Tests are broken because ExecutorPodFactory doesn't consider the mounted empty dirs. I'll work on fixing that. |
|
@mccheah ready to review? Any particular parts to focus on? |
|
still haven't fixed the tests yet |
|
@ash211 @kimoonkim ready for review again. I changed the architecture a bit to make testing easier. |
|
(killed the integration test run on the intermediate commit to save time) |
| // to the directory that the shuffle service is aware of. It would be better for | ||
| // these directories to be separate so that the lifetime of the non-shuffle scratch | ||
| // space is tied to an emptyDir instead of the hostPath. This requires a change in | ||
| // core Spark as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was trying to think through if there are any security concerns here, but I don't think there are any additional ones. Any data in the broadcast/cache files would with high probability also be accessible in shuffle files, so there's no new data exposed. We have other methods to secure the shuffle dirs between peer executors on the same node
|
Are there plans to merge this to spark master? |
…ic allocation mode (rebased) (apache-spark-on-k8s#522) * Use emptyDir volume mounts for executor local directories. * Mount local dirs in the driver. Remove shuffle dir configuration. * Arrange imports * Fix style and integration tests. * Add TODO note for volume types to change. * Add unit test and extra documentation. * Fix existing unit tests and add tests for empty dir volumes * Remove extraneous constant (cherry picked from commit 49932d6) Conflicts: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
…ic allocation mode (rebased) (apache-spark-on-k8s#522) * Use emptyDir volume mounts for executor local directories. * Mount local dirs in the driver. Remove shuffle dir configuration. * Arrange imports * Fix style and integration tests. * Add TODO note for volume types to change. * Add unit test and extra documentation. * Fix existing unit tests and add tests for empty dir volumes * Remove extraneous constant
Rebased version of #486.
Closes #439.
This is extremely important for performance, especially in shuffle-heavy computations where the executors perform a large amount of disk I/O. We only provision these volumes in static allocation mode without using the shuffle service because using a shuffle service requires mounting hostPath volumes, instead.