Download remotely-located resources on driver and executor startup via init-container #251

mccheah · 2017-04-28T03:11:52Z

Combines #240 and #249 while not including #246. This approach is different from #240 in that it uses a single init-container to fetch all dependencies at once.

This approach takes the various responsibilities of configuring the driver and executors to use init-containers to download dependencies. This approach breaks up the configuration of the init-container on the driver into multiple steps. The step that actually configures the pod is re-used for configuring the executor pod to use the same init-container to fetch dependencies as well.

There are numerous steps required for the init-container to behave properly. Previously, all of the steps were handled in the same class - the MountedDependencyUploader. However, it became difficult to extend the existing class to be both reusable for the executors as well as for it to now not only handle uploads from the user's local machine, but also to configure the init-container to download files from remote locations.

Therefore this PR aims to take a much more modular approach that enumerates the steps in specific classes. The steps that are taken to set up the init-container as well as use the submitted dependencies in the driver are as follows:

If the user provided a resource staging server, upload local dependencies to it
If dependencies were uploaded in step 1, create a Kubernetes secret that contains the tokens required for the driver to retrieve the uploaded dependencies in the init-container.
Create a ConfigMap that enumerates properties the init-container can use to fetch dependencies accordingly. The config map includes the locations of remote files to download as well as instructions to contact the resource staging server if local files were submitted in steps 1 and 2.
Add the init-container to the driver pod and mount volumes corresponding to the created config map in step 3 and, if applicable, the secret created in step 2.
Configure the driver's classpath taking into account that remote and submitted dependencies will be downloaded by the init-container.
Configure the value of spark.jars and spark.files taking into account that submitted files will be downloaded from the resource staging server. This differs slightly from step 5 as remote locations are preserved in the resulting configuration.
Configure the application to instruct executors to use the config map built in step 3 to fetch dependencies in their own init-containers.

All of these steps are executed by the submission client, starting from here. Tests of the Client should mock out each of the steps accordingly. Originally, each of the step classes had their own associated provider class to assist testing, but in the end the boilerplate was reduced by using a single provider class that configures everything related to the init-containers. The single provider class could also ensure that steps that needed to be configured with consistent values could be guaranteed to be wired with the same values from the same location. For example, the localized files resolution and the init-container config map must both reference the same download path for the jars, so the components provider can accordingly use the same value when configuring both of these steps.

Finally, the executors also are configured to use init-containers to download dependencies. The bulk of the code that's used here is shared with the logic that configures the driver.

…iner in executors.

mccheah · 2017-04-28T03:14:21Z

...ernetes/core/src/test/scala/org/apache/spark/deploy/kubernetes/submit/v2/ClientV2Suite.scala

-      returnValue
-    }
-  }
+  // TODO


The Client architecture has changed too much, so it's probably better to re-implement the tests from the ground up.

mccheah · 2017-04-28T05:00:04Z

Still missing tests, but ready for review on the concept @aash @foxish @erikerlandson. Apologies for the PR being so large - the old architecture had to be replaced with something more flexible given the new use cases:

The need to use the same init-container but dynamically insert configuration to fetch resource staging server dependencies, and
Re-use the init-container attachment logic for the executors.

mccheah · 2017-04-28T20:16:57Z

rerun unit tests please

mccheah · 2017-04-29T02:25:44Z

rerun unit tests please

mccheah · 2017-05-01T17:17:45Z

Test failure is from #255

…es' into init-containers-on-executors-with-remote-files

mccheah · 2017-05-11T23:17:01Z

rerun integration tests please

mccheah · 2017-05-13T00:41:07Z

@foxish @ash211 @erikerlandson @kimoonkim

I'm coming closer to finishing tests for this PR. The main ones I have left are for the main provider class and for the main Client code that wires everything together.

In light of #263 and the fact that this particular PR is difficult to review because of its size, I've updated the description of the PR walking through the changes that were made and my general reasoning behind this code. I hope this helps the review process.

foxish · 2017-05-16T07:29:57Z

.../core/src/main/scala/org/apache/spark/deploy/kubernetes/SparkPodInitContainerBootstrap.scala

+  /**
+   * Bootstraps an init-container that downloads dependencies to be used by a main container.
+   * Note that this primarily assumes that the init-container's configuration is being provided
+   * by a Config Map that was installed by some other component; that is, the implementation


nit: ConfigMap

foxish · 2017-05-16T07:39:47Z

...e-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/constants.scala

+    "/mnt/secrets/spark-init"
+  private[spark] val INIT_CONTAINER_SUBMITTED_JARS_SECRET_KEY =
+    "downloadSubmittedJarsSecret"
+  private[spark] val INIT_CONTAINER_SUBMITTED_FILES_SECRET_KEY =


Why do we have separate secrets for files and jars? Do we envision people sharing jars but not files?

They're only separate secret keys. The reason is simply for architectural ease. We want to deploy the files bundle and the jars bundle in different directories, so we just upload two bundles for each resource type and download them to different directories.

foxish · 2017-05-16T07:45:16Z

.../core/src/main/scala/org/apache/spark/deploy/kubernetes/SparkPodInitContainerBootstrap.scala

+    downloadTimeoutMinutes: Long,
+    initContainerConfigMapName: String,
+    initContainerConfigMapKey: String,
+    submittedDependencyPlugin: Option[SubmittedDependencyInitContainerVolumesPlugin])


SubmittedDependencyInitContainerVolumesPlugin is used solely for passing the server secret? Then we should probably call it something else? Maybe InitContainerSecretsPlugin?

foxish · 2017-05-16T08:16:13Z

...ers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/v2/Client.scala

+      val podWithInitContainer = initContainerBootstrap.bootstrapInitContainerAndVolumes(
+          driverContainer.getName, basePod)
+
+      val nonDriverPodKubernetesResources = Seq(initContainerConfigMap.configMap) ++


rename to DriverOwnedResources?

foxish · 2017-05-16T08:18:52Z

.../scala/org/apache/spark/deploy/kubernetes/submit/v2/ExecutorInitContainerConfiguration.scala

+  def configureSparkConfForExecutorInitContainer(originalSparkConf: SparkConf): SparkConf
+}
+
+private[spark] class ExecutorInitContainerConfigurationImpl(


Can this mechanism be shared with the driver pod's init container? Why do we need a separate mechanism for just the executor-init-container?

It's different because instead of building a config map and secret bundle from scratch, the executors should re-use the ones that the submission client built to start the driver.

We do share the init-container bootstrap logic.

foxish · 2017-05-16T08:26:06Z

...la/org/apache/spark/deploy/kubernetes/submit/v2/PropertiesConfigMapFromScalaMapBuilder.scala

+ */
+private[spark] object PropertiesConfigMapFromScalaMapBuilder {
+
+  def buildConfigMap(


We must specify somewhere in the docs that the user must also be able to create configmaps in order to use the init-containers and file submission

foxish · 2017-05-16T08:28:19Z

.../apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyDownloadInitContainer.scala

+}
+
+/**
+ * Process that fetches files from a resource staging server and/or arbi trary remote locations.


typo: arbi trary -> arbitrary

foxish · 2017-05-16T08:35:00Z

.../src/test/scala/org/apache/spark/deploy/kubernetes/SparkPodInitContainerBootstrapSuite.scala

+    assert(initContainer.getArgs.asScala === List(INIT_CONTAINER_PROPERTIES_FILE_PATH))
+  }
+
+  test("Running without submitted dependencies adds volume mounts to main container.") {


What do the init-containers do when there is no submittedDependencyPlugin?

The init-container can still fetch files from hdfs, http, etc.

foxish · 2017-05-16T08:42:49Z

@mccheah Thanks for the summary in the description. It really helped review the PR.

foxish · 2017-05-16T08:44:25Z

rerun unit tests please

…es' into init-containers-on-executors-with-remote-files

ash211 · 2017-05-16T18:04:08Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/config.scala

+
+  private[spark] val INIT_CONTAINER_REMOTE_FILES =
+    ConfigBuilder("spark.kubernetes.initcontainer.remoteFiles")
+      .doc("Comma-separated list of file URIs to download in the init-container. This is inferred" +


inferred -> calculated

ash211 · 2017-05-16T18:04:14Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/config.scala

+
+  private[spark] val INIT_CONTAINER_REMOTE_JARS =
+    ConfigBuilder("spark.kubernetes.initcontainer.remoteJars")
+      .doc("Comma-separated list of jar URIs to download in the init-container. This is inferred" +


inferred -> calculated

ash211 · 2017-05-16T18:05:08Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/config.scala

    ConfigBuilder("spark.kubernetes.mountdependencies.mountTimeout")
      .doc("Timeout before aborting the attempt to download and unpack local dependencies from" +
-        " the dependency staging server when initializing the driver pod.")
+        " remote locations and the resource etaging server when initializing the driver and" +


typo: etaging -> staging

ash211 · 2017-05-16T18:06:50Z

...e-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/constants.scala

  private[spark] val ENV_DRIVER_MEMORY = "SPARK_DRIVER_MEMORY"
-  private[spark] val ENV_UPLOADED_JARS_DIR = "SPARK_UPLOADED_JARS_DIR"
  private[spark] val ENV_SUBMIT_EXTRA_CLASSPATH = "SPARK_SUBMIT_EXTRA_CLASSPATH"
+  private[spark] val ENV_EXECUTOR_EXTRA_CLASSPATH = "SPARK_SUBMIT_EXTRA_CLASSPATH"


these two are the same string value, is that intentional?

They're set in different contexts so it's fine if they're the same.

ash211 · 2017-05-16T18:10:52Z

...ers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/v2/Client.scala

+          .provideSubmittedDependenciesSecretBuilder(
+              maybeSubmittedResourceIdentifiers.map(_.secrets()))
+      val maybeSubmittedDependenciesSecret = maybeSecretBuilder.map(_.buildInitContainerSecret())
+      val initContainerConfigMapBuilder = initContainerComponentsProvider


inline initContainerConfigMapBuilder if you're not using it later -- fewer identifiers is less conceptual overhead

ash211 · 2017-05-16T18:12:33Z

.../scala/org/apache/spark/deploy/kubernetes/submit/v2/SparkInitContainerConfigMapBuilder.scala

+   * remote dependencies. The config map includes the remote jars and files to download,
+   * as well as details to fetch files from a resource staging server, if applicable.
+   */
+  def buildInitContainerConfigMap(): SingleKeyConfigMap


just build ? I don't think this trait is implemented anywhere with multiple inheritance where the name build would conflict with another method

ash211 · 2017-05-16T18:13:23Z

...ers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/submit/v2/Client.scala

+      val initContainerConfigMapBuilder = initContainerComponentsProvider
+          .provideInitContainerConfigMapBuilder(maybeSubmittedResourceIdentifiers.map(_.ids()))
+      val initContainerConfigMap = initContainerConfigMapBuilder.buildInitContainerConfigMap()
+      val initContainerBootstrap = initContainerComponentsProvider.provideInitContainerBootstrap()


inline initContainerBootstrap

ash211 · 2017-05-16T18:35:48Z

...ala/org/apache/spark/deploy/kubernetes/submit/v2/DriverInitContainerComponentsProvider.scala

+    sparkConf: SparkConf,
+    kubernetesAppId: String,
+    sparkJars: Seq[String],
+    sparkFiles: Seq[String])


aren't sparkJars and sparkFiles already in sparkConf? is this duplicate info that could be out of sync in the constructor parameters?

It's in the SparkConf but we don't want to calculate the Seq contents in two places, since these values are also used in Client.

Client only uses it for validation that duplicate file names are not present, however. It's unclear if this validation should be done elsewhere.

ash211

Looks good! We might find more things down the way, but I think this is ready for merge.

We've got a full implementation of V2 submission with the staging server and driver/executor init containers in here, plus some unit test coverage (non-exhaustive) and an integration test to make sure the whole thing works.

I'm comfortable merging!

foxish · 2017-05-16T23:22:10Z

LGTM

ash211 · 2017-05-17T16:22:41Z

rerun unit tests please

…es' into init-containers-on-executors-with-remote-files

ash211

Fixing those merge conflicts looks good, let's merge when this build passes.

cc @foxish for any last comments

…a init-container (#251) * Download remotely-located resources on driver startup. Use init-container in executors. * FIx owner reference slightly * Clean up config * Don't rely too heavily on conventions that can change * Fix flaky test * Tidy up file resolver * Whitespace arrangement * Indentation change * Fix more indentation * Consolidate init container component providers * Minor method signature and comment changes * Rename class for consistency * Resolve conflicts * Fix flaky test * Add some tests and some refactoring. * Make naming consistent for Staged -> Submitted * Add unit test for the submission client. * Refine expectations * Rename variables and fix typos * Address more comments. Remove redundant SingleKeyConfigMap. * Minor test adjustments. * add another test * Fix conflicts.

…-on-k8s#251) The recent org.apache.orc:orc-mapreduce:1.4.0 addition introduced a conflict that we must resolve in the BOM

…a init-container (apache-spark-on-k8s#251) * Download remotely-located resources on driver startup. Use init-container in executors. * FIx owner reference slightly * Clean up config * Don't rely too heavily on conventions that can change * Fix flaky test * Tidy up file resolver * Whitespace arrangement * Indentation change * Fix more indentation * Consolidate init container component providers * Minor method signature and comment changes * Rename class for consistency * Resolve conflicts * Fix flaky test * Add some tests and some refactoring. * Make naming consistent for Staged -> Submitted * Add unit test for the submission client. * Refine expectations * Rename variables and fix typos * Address more comments. Remove redundant SingleKeyConfigMap. * Minor test adjustments. * add another test * Fix conflicts.

Download remotely-located resources on driver startup. Use init-conta…

a457bce

…iner in executors.

mccheah commented Apr 28, 2017

View reviewed changes

mccheah added 4 commits April 27, 2017 20:15

FIx owner reference slightly

7bcb093

Clean up config

ed5cbf7

Don't rely too heavily on conventions that can change

ec8c4af

Fix flaky test

970fb5f

This was referenced Apr 28, 2017

Download remotely-located resources on driver startup. #240

Closed

[WIP] Use the dependency download init-containers on executors. #249

Closed

mccheah added 3 commits April 27, 2017 21:36

Tidy up file resolver

92cb069

Whitespace arrangement

0d2cb6f

Indentation change

9cab55d

mccheah added 2 commits April 27, 2017 22:04

Fix more indentation

2fe1921

Consolidate init container component providers

5dddbd3

Minor method signature and comment changes

7d145a2

Rename class for consistency

8ca6c72

mccheah mentioned this pull request May 1, 2017

Allow client certificate PEM for resource staging server #257

Merged

mccheah added 3 commits May 11, 2017 14:23

Merge remote-tracking branch 'apache-spark-on-k8s/branch-2.1-kubernet…

a371b1d

…es' into init-containers-on-executors-with-remote-files

Resolve conflicts

157d8a4

Fix flaky test

d4bf83d

mccheah added 2 commits May 12, 2017 17:03

Add some tests and some refactoring.

98dcaa3

Make naming consistent for Staged -> Submitted

4f73bb0

mccheah mentioned this pull request May 13, 2017

Reconsider use of JAVA_HOME env var #268

Open

mccheah added 2 commits May 15, 2017 18:30

Add unit test for the submission client.

44ec870

Refine expectations

a436e59

foxish reviewed May 16, 2017

View reviewed changes

mccheah added 2 commits May 16, 2017 12:30

Rename variables and fix typos

ad738c6

Merge remote-tracking branch 'apache-spark-on-k8s/branch-2.1-kubernet…

0b60e99

…es' into init-containers-on-executors-with-remote-files

ash211 reviewed May 16, 2017

View reviewed changes

mccheah added 3 commits May 16, 2017 12:53

Address more comments. Remove redundant SingleKeyConfigMap.

6134dc1

Minor test adjustments.

58dfb43

add another test

f214df2

ash211 approved these changes May 16, 2017

View reviewed changes

ash211 changed the title ~~Download remotely-located resources on driver startup. Use init-container on executors.~~ Download remotely-located resources on driver and executor startup via init-container May 17, 2017

foxish mentioned this pull request May 17, 2017

Dynamic allocation #272

Merged

mccheah added 2 commits May 17, 2017 11:06

Merge remote-tracking branch 'apache-spark-on-k8s/branch-2.1-kubernet…

903e545

…es' into init-containers-on-executors-with-remote-files

Fix conflicts.

7c379b2

ash211 approved these changes May 17, 2017

View reviewed changes

ash211 merged commit f005268 into branch-2.1-kubernetes May 17, 2017

Download remotely-located resources on driver and executor startup via init-container #251

Download remotely-located resources on driver and executor startup via init-container #251

Uh oh!

Conversation

mccheah commented Apr 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah commented Apr 28, 2017

Uh oh!

mccheah commented Apr 28, 2017

Uh oh!

mccheah commented Apr 29, 2017

Uh oh!

mccheah commented May 1, 2017

Uh oh!

mccheah commented May 11, 2017

Uh oh!

mccheah commented May 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

foxish commented May 16, 2017

Uh oh!

foxish commented May 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

foxish commented May 16, 2017

Uh oh!

ash211 commented May 17, 2017

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

mccheah commented Apr 28, 2017 •

edited

Loading