Skip to content

Conversation

@madanadit
Copy link

What changes were proposed in this pull request?

This PR introduces a new config spark.kubernetes.driver/executor.volumes taking a values of the format documented here

The use case is to enable short-circuit writes to distributed storage on k8s. The Alluxio File System uses domain sockets to enable short-circuit writes from the client to worker memory when co-located on the same host machine. A directory, lets say /tmp/domain on the host, is mounted on the Alluxio worker container as well as the Alluxio client ( = Spark executor) container. The worker creates a domain socket /tmp/domain/d and if the client container mounts the same directory, it can write directly to the Alluxio worker w/o passing through network stack. The end result is faster data access when data is local.

How was this patch tested?

Manual testing on a k8s v1.8 cluster. Unit tests added to Driver/ExecutorPodFactorySuite.

This PR replaces #21032.

@madanadit
Copy link
Author

@liyinan926 @foxish Please take a look

@foxish
Copy link
Contributor

foxish commented Apr 18, 2018

jenkins, ok to test

@foxish
Copy link
Contributor

foxish commented Apr 18, 2018

@madanadit, thanks for following up with the PR and for your design doc. I had a few comments on the doc for further discussion, but in general, I think this is a good direction.

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2398/

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Test build #89529 has finished for PR 21095 at commit 7c1be8a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2398/

@madanadit
Copy link
Author

Thanks @foxish for your feedback. As a first time contributor to Spark, I would like to limit the scope of the changes in this PR. Let me know when you're ready to review again.

@liyinan926
Copy link
Contributor

My general comment is the code is centered around parsing and setting up hostPath volumes even though the proposal is generic. I would suggest making the code generic as well. Particularly I would like to see an implementation that is able to parse all volume related properties and add all discovered volumes in one shot.

@madanadit
Copy link
Author

@liyinan926 Thanks for the comment. I refactored it to address your concern. The code is simple enough to be easily refactored if the need be by whoever implements the next volume type.

Note that the goal of this PR is limited to hostPath volumes. Even though the implementation here is general enough to be extended, I will not attempt to add other volume types in this PR.

@SparkQA
Copy link

SparkQA commented Apr 20, 2018

Test build #89601 has finished for PR 21095 at commit f482dfc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2018

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2450/

@SparkQA
Copy link

SparkQA commented Apr 20, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2450/

@liyinan926
Copy link
Contributor

Thanks @madanadit! I will take a look over the weekend or early net week.

@madanadit
Copy link
Author

Hey @liyinan926, do you think you'll have time to take a look this week?

@foxish
Copy link
Contributor

foxish commented Apr 26, 2018

@madanadit, sorry about the long turnaround time here. I made a cursory pass over it - and it looks good. Before getting deeper into the code, I think we need more exhaustive unit testing for non-hostpath type volumes, and a couple of e2e tests. Once those are done, I'm also happy to do a more detailed review here.

@liyinan926
Copy link
Contributor

@madanadit sorry I didn't get a chance to look into this. I will take a detailed look once @foxish's comments on testing are addressed.

@madanadit
Copy link
Author

Hi @foxish, I don't see why the 2 testing concerns should block reviewing this PR.

  1. This PR does not attempt to address non-hostpath volumes (both implementation and unit tests are hence not included).
  2. I agree e2e tests do make sense. However, e2e tests should not block this PR in any way. e2e tests can also be added once the PR is reviewed?

@foxish
Copy link
Contributor

foxish commented May 1, 2018

The testing is not a blocker for the review. When I said "tests for non-hostpath type volumes", I meant to say that we want to cover more than just hostpath mounts with the initial PR - because we might end up with something too specific. Sorry if that wasn't clear from my comment. I think doing EmptyDir mounts alongside hostpath (with as much code sharing as possible) would be a good idea since it's the most common volume type. With those two, I think we can push this forward and have the e2e come along side by side.

* @param prefix the prefix for volume configuration
* @return a tuple of (pod with the volume(s) added, container with mount(s) added)
*/
def addVolumes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Scaladoc should not mention hostPath as this function is not hostPath exclusively.

}
// Populate spec
volumes.foreach {
case (name, spec) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case line should be merged into the previous line according to the Spark code convention, e.g., volumes.foreach { case (name, spec) =>.

volumes.foreach {
case (name, spec) =>
properties.foreach {
k =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

val podBuilder = new PodBuilder(pod).editOrNewSpec()
val containerBuilder = new ContainerBuilder(container)
volumes foreach {
case (name, spec) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

if (spec.optionsSpec.contains(KUBERNETES_VOLUMES_PATH_KEY)) {
hostPath = Some(spec.optionsSpec(KUBERNETES_VOLUMES_PATH_KEY))
}
if (hostPath.isDefined && spec.mountPath.isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this if block can be combined with the previous if block, e.g., if (spec.mountPath.isDefined && spec.optionsSpec.contains(KUBERNETES_VOLUMES_PATH_KEY)) {...}.

.withHostPath(new HostPathVolumeSource(hostPath.get))
.withName(name)
.build())
val volumeBuilder = new VolumeMountBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/volumeBuilder/mountBuilder.

* @param prefix the given property name prefix
* @return a Map storing with volume name as key and spec as value
*/
def parseHostPathVolumesWithPrefix(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you don't really need this function as it's just a wrapper of parseVolumesWithPrefix .

.build()
SparkPod(driverPod, driverContainer)

val (driverPodWithVolumes, driverContainerVolumes) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of putting the logic of volume mounting in BasicDriverFeatureStep and BasicExecutorFeatureStep, we should add a new step for mounting volumes, similarly to how we handle secrets, e.g., MountVolumesFeatureStep where the logic of addVolumes should be. This feature step can be used for both the driver and executors.

@andrusha
Copy link
Contributor

andrusha commented May 7, 2018

@madanadit @liyinan926 @foxish I addressed comments here and added PVC support #21260. However I'm unsure if this is the right way to go, please check the PR.

@liyinan926
Copy link
Contributor

@andrusha Is the purpose of #21260 to replace both this PR and #21238?

@andrusha
Copy link
Contributor

andrusha commented May 8, 2018

@liyinan926 that's the idea. We should have a single step to mount all kinds of volumes including hostPath and emptyDir. The only problem is the configuration awkwardness.

@madanadit
Copy link
Author

madanadit commented May 8, 2018

Thanks @liyinan926 for the review and @andrusha for addressing the comments and moving the work along. I'll close this PR once #21260 merges.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

asfgit pushed a commit that referenced this pull request Jul 11, 2018
This PR continues #21095 and intersects with #21238. I've added volume mounts as a separate step and added PersistantVolumeClaim support.

There is a fundamental problem with how we pass the options through spark conf to fabric8. For each volume type and all possible volume options we would have to implement some custom code to map config values to fabric8 calls. This will result in big body of code we would have to support and means that Spark will always be somehow out of sync with k8s.

I think there needs to be a discussion on how to proceed correctly (eg use PodPreset instead)

----

Due to the complications of provisioning and managing actual resources this PR addresses only volume mounting of already present resources.

----
- [x] emptyDir support
- [x] Testing
- [x] Documentation
- [x] KubernetesVolumeUtils tests

Author: Andrew Korzhuev <[email protected]>
Author: madanadit <[email protected]>

Closes #21260 from andrusha/k8s-vol.
@liyinan926
Copy link
Contributor

@madanadit can you close this as #21260 has been merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants