Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@mccheah
Copy link

@mccheah mccheah commented Oct 11, 2017

Rebased version of #486.

Closes #439.

This is extremely important for performance, especially in shuffle-heavy computations where the executors perform a large amount of disk I/O. We only provision these volumes in static allocation mode without using the shuffle service because using a shuffle service requires mounting hostPath volumes, instead.

@mccheah
Copy link
Author

mccheah commented Oct 11, 2017

I created this PR so that I didn't have to overwrite the original branch from #486. The rebase was pretty tricky so I want to keep the old history around just in case.

@mccheah
Copy link
Author

mccheah commented Oct 11, 2017

Tests are broken because ExecutorPodFactory doesn't consider the mounted empty dirs. I'll work on fixing that.

@ash211
Copy link

ash211 commented Oct 13, 2017

@mccheah ready to review? Any particular parts to focus on?

@mccheah
Copy link
Author

mccheah commented Oct 13, 2017

still haven't fixed the tests yet

@mccheah
Copy link
Author

mccheah commented Oct 16, 2017

@ash211 @kimoonkim ready for review again. I changed the architecture a bit to make testing easier.

@ash211
Copy link

ash211 commented Oct 16, 2017

(killed the integration test run on the intermediate commit to save time)

// to the directory that the shuffle service is aware of. It would be better for
// these directories to be separate so that the lifetime of the non-shuffle scratch
// space is tied to an emptyDir instead of the hostPath. This requires a change in
// core Spark as well.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was trying to think through if there are any security concerns here, but I don't think there are any additional ones. Any data in the broadcast/cache files would with high probability also be accessible in shuffle files, so there's no new data exposed. We have other methods to secure the shuffle dirs between peer executors on the same node

@ash211 ash211 merged commit 49932d6 into branch-2.2-kubernetes Oct 16, 2017
@ash211 ash211 deleted the use-empty-dir-static-allocation-shuffles-rebased branch October 16, 2017 23:07
@amadav
Copy link

amadav commented Jul 5, 2018

Are there plans to merge this to spark master?

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
…ic allocation mode (rebased) (apache-spark-on-k8s#522)

* Use emptyDir volume mounts for executor local directories.

* Mount local dirs in the driver. Remove shuffle dir configuration.

* Arrange imports

* Fix style and integration tests.

* Add TODO note for volume types to change.

* Add unit test and extra documentation.

* Fix existing unit tests and add tests for empty dir volumes

* Remove extraneous constant

(cherry picked from commit 49932d6)

 Conflicts:
	resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
…ic allocation mode (rebased) (apache-spark-on-k8s#522)

* Use emptyDir volume mounts for executor local directories.

* Mount local dirs in the driver. Remove shuffle dir configuration.

* Arrange imports

* Fix style and integration tests.

* Add TODO note for volume types to change.

* Add unit test and extra documentation.

* Fix existing unit tests and add tests for empty dir volumes

* Remove extraneous constant
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants