Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@mccheah
Copy link

@mccheah mccheah commented Apr 28, 2017

Combines #240 and #249 while not including #246. This approach is different from #240 in that it uses a single init-container to fetch all dependencies at once.

This approach takes the various responsibilities of configuring the driver and executors to use init-containers to download dependencies. This approach breaks up the configuration of the init-container on the driver into multiple steps. The step that actually configures the pod is re-used for configuring the executor pod to use the same init-container to fetch dependencies as well.

There are numerous steps required for the init-container to behave properly. Previously, all of the steps were handled in the same class - the MountedDependencyUploader. However, it became difficult to extend the existing class to be both reusable for the executors as well as for it to now not only handle uploads from the user's local machine, but also to configure the init-container to download files from remote locations.

Therefore this PR aims to take a much more modular approach that enumerates the steps in specific classes. The steps that are taken to set up the init-container as well as use the submitted dependencies in the driver are as follows:

  1. If the user provided a resource staging server, upload local dependencies to it
  2. If dependencies were uploaded in step 1, create a Kubernetes secret that contains the tokens required for the driver to retrieve the uploaded dependencies in the init-container.
  3. Create a ConfigMap that enumerates properties the init-container can use to fetch dependencies accordingly. The config map includes the locations of remote files to download as well as instructions to contact the resource staging server if local files were submitted in steps 1 and 2.
  4. Add the init-container to the driver pod and mount volumes corresponding to the created config map in step 3 and, if applicable, the secret created in step 2.
  5. Configure the driver's classpath taking into account that remote and submitted dependencies will be downloaded by the init-container.
  6. Configure the value of spark.jars and spark.files taking into account that submitted files will be downloaded from the resource staging server. This differs slightly from step 5 as remote locations are preserved in the resulting configuration.
  7. Configure the application to instruct executors to use the config map built in step 3 to fetch dependencies in their own init-containers.

All of these steps are executed by the submission client, starting from here. Tests of the Client should mock out each of the steps accordingly. Originally, each of the step classes had their own associated provider class to assist testing, but in the end the boilerplate was reduced by using a single provider class that configures everything related to the init-containers. The single provider class could also ensure that steps that needed to be configured with consistent values could be guaranteed to be wired with the same values from the same location. For example, the localized files resolution and the init-container config map must both reference the same download path for the jars, so the components provider can accordingly use the same value when configuring both of these steps.

Finally, the executors also are configured to use init-containers to download dependencies. The bulk of the code that's used here is shared with the logic that configures the driver.

returnValue
}
}
// TODO
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Client architecture has changed too much, so it's probably better to re-implement the tests from the ground up.

@mccheah
Copy link
Author

mccheah commented Apr 28, 2017

Still missing tests, but ready for review on the concept @aash @foxish @erikerlandson. Apologies for the PR being so large - the old architecture had to be replaced with something more flexible given the new use cases:

  • The need to use the same init-container but dynamically insert configuration to fetch resource staging server dependencies, and
  • Re-use the init-container attachment logic for the executors.

@mccheah
Copy link
Author

mccheah commented Apr 28, 2017

rerun unit tests please

@mccheah
Copy link
Author

mccheah commented Apr 29, 2017

rerun unit tests please

@mccheah
Copy link
Author

mccheah commented May 1, 2017

Test failure is from #255

@mccheah
Copy link
Author

mccheah commented May 11, 2017

rerun integration tests please

@mccheah
Copy link
Author

mccheah commented May 13, 2017

@foxish @ash211 @erikerlandson @kimoonkim

I'm coming closer to finishing tests for this PR. The main ones I have left are for the main provider class and for the main Client code that wires everything together.

In light of #263 and the fact that this particular PR is difficult to review because of its size, I've updated the description of the PR walking through the changes that were made and my general reasoning behind this code. I hope this helps the review process.

/**
* Bootstraps an init-container that downloads dependencies to be used by a main container.
* Note that this primarily assumes that the init-container's configuration is being provided
* by a Config Map that was installed by some other component; that is, the implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ConfigMap

"/mnt/secrets/spark-init"
private[spark] val INIT_CONTAINER_SUBMITTED_JARS_SECRET_KEY =
"downloadSubmittedJarsSecret"
private[spark] val INIT_CONTAINER_SUBMITTED_FILES_SECRET_KEY =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have separate secrets for files and jars? Do we envision people sharing jars but not files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're only separate secret keys. The reason is simply for architectural ease. We want to deploy the files bundle and the jars bundle in different directories, so we just upload two bundles for each resource type and download them to different directories.

downloadTimeoutMinutes: Long,
initContainerConfigMapName: String,
initContainerConfigMapKey: String,
submittedDependencyPlugin: Option[SubmittedDependencyInitContainerVolumesPlugin])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SubmittedDependencyInitContainerVolumesPlugin is used solely for passing the server secret? Then we should probably call it something else? Maybe InitContainerSecretsPlugin?

val podWithInitContainer = initContainerBootstrap.bootstrapInitContainerAndVolumes(
driverContainer.getName, basePod)

val nonDriverPodKubernetesResources = Seq(initContainerConfigMap.configMap) ++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to DriverOwnedResources?

def configureSparkConfForExecutorInitContainer(originalSparkConf: SparkConf): SparkConf
}

private[spark] class ExecutorInitContainerConfigurationImpl(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this mechanism be shared with the driver pod's init container? Why do we need a separate mechanism for just the executor-init-container?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's different because instead of building a config map and secret bundle from scratch, the executors should re-use the ones that the submission client built to start the driver.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do share the init-container bootstrap logic.

*/
private[spark] object PropertiesConfigMapFromScalaMapBuilder {

def buildConfigMap(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must specify somewhere in the docs that the user must also be able to create configmaps in order to use the init-containers and file submission

}

/**
* Process that fetches files from a resource staging server and/or arbi trary remote locations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: arbi trary -> arbitrary

assert(initContainer.getArgs.asScala === List(INIT_CONTAINER_PROPERTIES_FILE_PATH))
}

test("Running without submitted dependencies adds volume mounts to main container.") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the init-containers do when there is no submittedDependencyPlugin?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init-container can still fetch files from hdfs, http, etc.

@foxish
Copy link
Member

foxish commented May 16, 2017

@mccheah Thanks for the summary in the description. It really helped review the PR.

@foxish
Copy link
Member

foxish commented May 16, 2017

rerun unit tests please


private[spark] val INIT_CONTAINER_REMOTE_FILES =
ConfigBuilder("spark.kubernetes.initcontainer.remoteFiles")
.doc("Comma-separated list of file URIs to download in the init-container. This is inferred" +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inferred -> calculated


private[spark] val INIT_CONTAINER_REMOTE_JARS =
ConfigBuilder("spark.kubernetes.initcontainer.remoteJars")
.doc("Comma-separated list of jar URIs to download in the init-container. This is inferred" +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inferred -> calculated

ConfigBuilder("spark.kubernetes.mountdependencies.mountTimeout")
.doc("Timeout before aborting the attempt to download and unpack local dependencies from" +
" the dependency staging server when initializing the driver pod.")
" remote locations and the resource etaging server when initializing the driver and" +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: etaging -> staging

private[spark] val ENV_DRIVER_MEMORY = "SPARK_DRIVER_MEMORY"
private[spark] val ENV_UPLOADED_JARS_DIR = "SPARK_UPLOADED_JARS_DIR"
private[spark] val ENV_SUBMIT_EXTRA_CLASSPATH = "SPARK_SUBMIT_EXTRA_CLASSPATH"
private[spark] val ENV_EXECUTOR_EXTRA_CLASSPATH = "SPARK_SUBMIT_EXTRA_CLASSPATH"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two are the same string value, is that intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're set in different contexts so it's fine if they're the same.

.provideSubmittedDependenciesSecretBuilder(
maybeSubmittedResourceIdentifiers.map(_.secrets()))
val maybeSubmittedDependenciesSecret = maybeSecretBuilder.map(_.buildInitContainerSecret())
val initContainerConfigMapBuilder = initContainerComponentsProvider
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline initContainerConfigMapBuilder if you're not using it later -- fewer identifiers is less conceptual overhead

* remote dependencies. The config map includes the remote jars and files to download,
* as well as details to fetch files from a resource staging server, if applicable.
*/
def buildInitContainerConfigMap(): SingleKeyConfigMap
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just build ? I don't think this trait is implemented anywhere with multiple inheritance where the name build would conflict with another method

val initContainerConfigMapBuilder = initContainerComponentsProvider
.provideInitContainerConfigMapBuilder(maybeSubmittedResourceIdentifiers.map(_.ids()))
val initContainerConfigMap = initContainerConfigMapBuilder.buildInitContainerConfigMap()
val initContainerBootstrap = initContainerComponentsProvider.provideInitContainerBootstrap()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline initContainerBootstrap

sparkConf: SparkConf,
kubernetesAppId: String,
sparkJars: Seq[String],
sparkFiles: Seq[String])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't sparkJars and sparkFiles already in sparkConf? is this duplicate info that could be out of sync in the constructor parameters?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the SparkConf but we don't want to calculate the Seq contents in two places, since these values are also used in Client.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client only uses it for validation that duplicate file names are not present, however. It's unclear if this validation should be done elsewhere.

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! We might find more things down the way, but I think this is ready for merge.

We've got a full implementation of V2 submission with the staging server and driver/executor init containers in here, plus some unit test coverage (non-exhaustive) and an integration test to make sure the whole thing works.

I'm comfortable merging!

@foxish
Copy link
Member

foxish commented May 16, 2017

LGTM

@ash211
Copy link

ash211 commented May 17, 2017

rerun unit tests please

@ash211 ash211 changed the title Download remotely-located resources on driver startup. Use init-container on executors. Download remotely-located resources on driver and executor startup via init-container May 17, 2017
@foxish foxish mentioned this pull request May 17, 2017
Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing those merge conflicts looks good, let's merge when this build passes.

cc @foxish for any last comments

@ash211 ash211 merged commit f005268 into branch-2.1-kubernetes May 17, 2017
foxish pushed a commit that referenced this pull request Jul 24, 2017
…a init-container (#251)

* Download remotely-located resources on driver startup. Use init-container in executors.

* FIx owner reference slightly

* Clean up config

* Don't rely too heavily on conventions that can change

* Fix flaky test

* Tidy up file resolver

* Whitespace arrangement

* Indentation change

* Fix more indentation

* Consolidate init container component providers

* Minor method signature and comment changes

* Rename class for consistency

* Resolve conflicts

* Fix flaky test

* Add some tests and some refactoring.

* Make naming consistent for Staged -> Submitted

* Add unit test for the submission client.

* Refine expectations

* Rename variables and fix typos

* Address more comments. Remove redundant SingleKeyConfigMap.

* Minor test adjustments.

* add another test

* Fix conflicts.
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
…-on-k8s#251)

The recent org.apache.orc:orc-mapreduce:1.4.0 addition introduced a conflict
that we must resolve in the BOM
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
…a init-container (apache-spark-on-k8s#251)

* Download remotely-located resources on driver startup. Use init-container in executors.

* FIx owner reference slightly

* Clean up config

* Don't rely too heavily on conventions that can change

* Fix flaky test

* Tidy up file resolver

* Whitespace arrangement

* Indentation change

* Fix more indentation

* Consolidate init container component providers

* Minor method signature and comment changes

* Rename class for consistency

* Resolve conflicts

* Fix flaky test

* Add some tests and some refactoring.

* Make naming consistent for Staged -> Submitted

* Add unit test for the submission client.

* Refine expectations

* Rename variables and fix typos

* Address more comments. Remove redundant SingleKeyConfigMap.

* Minor test adjustments.

* add another test

* Fix conflicts.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants