-
Notifications
You must be signed in to change notification settings - Fork 117
Download remotely-located resources on driver and executor startup via init-container #251
Download remotely-located resources on driver and executor startup via init-container #251
Conversation
…iner in executors.
| returnValue | ||
| } | ||
| } | ||
| // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Client architecture has changed too much, so it's probably better to re-implement the tests from the ground up.
|
Still missing tests, but ready for review on the concept @aash @foxish @erikerlandson. Apologies for the PR being so large - the old architecture had to be replaced with something more flexible given the new use cases:
|
|
rerun unit tests please |
|
rerun unit tests please |
|
Test failure is from #255 |
…es' into init-containers-on-executors-with-remote-files
|
rerun integration tests please |
|
@foxish @ash211 @erikerlandson @kimoonkim I'm coming closer to finishing tests for this PR. The main ones I have left are for the main provider class and for the main Client code that wires everything together. In light of #263 and the fact that this particular PR is difficult to review because of its size, I've updated the description of the PR walking through the changes that were made and my general reasoning behind this code. I hope this helps the review process. |
| /** | ||
| * Bootstraps an init-container that downloads dependencies to be used by a main container. | ||
| * Note that this primarily assumes that the init-container's configuration is being provided | ||
| * by a Config Map that was installed by some other component; that is, the implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ConfigMap
| "/mnt/secrets/spark-init" | ||
| private[spark] val INIT_CONTAINER_SUBMITTED_JARS_SECRET_KEY = | ||
| "downloadSubmittedJarsSecret" | ||
| private[spark] val INIT_CONTAINER_SUBMITTED_FILES_SECRET_KEY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have separate secrets for files and jars? Do we envision people sharing jars but not files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're only separate secret keys. The reason is simply for architectural ease. We want to deploy the files bundle and the jars bundle in different directories, so we just upload two bundles for each resource type and download them to different directories.
| downloadTimeoutMinutes: Long, | ||
| initContainerConfigMapName: String, | ||
| initContainerConfigMapKey: String, | ||
| submittedDependencyPlugin: Option[SubmittedDependencyInitContainerVolumesPlugin]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SubmittedDependencyInitContainerVolumesPlugin is used solely for passing the server secret? Then we should probably call it something else? Maybe InitContainerSecretsPlugin?
| val podWithInitContainer = initContainerBootstrap.bootstrapInitContainerAndVolumes( | ||
| driverContainer.getName, basePod) | ||
|
|
||
| val nonDriverPodKubernetesResources = Seq(initContainerConfigMap.configMap) ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to DriverOwnedResources?
| def configureSparkConfForExecutorInitContainer(originalSparkConf: SparkConf): SparkConf | ||
| } | ||
|
|
||
| private[spark] class ExecutorInitContainerConfigurationImpl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this mechanism be shared with the driver pod's init container? Why do we need a separate mechanism for just the executor-init-container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's different because instead of building a config map and secret bundle from scratch, the executors should re-use the ones that the submission client built to start the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do share the init-container bootstrap logic.
| */ | ||
| private[spark] object PropertiesConfigMapFromScalaMapBuilder { | ||
|
|
||
| def buildConfigMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must specify somewhere in the docs that the user must also be able to create configmaps in order to use the init-containers and file submission
| } | ||
|
|
||
| /** | ||
| * Process that fetches files from a resource staging server and/or arbi trary remote locations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: arbi trary -> arbitrary
| assert(initContainer.getArgs.asScala === List(INIT_CONTAINER_PROPERTIES_FILE_PATH)) | ||
| } | ||
|
|
||
| test("Running without submitted dependencies adds volume mounts to main container.") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do the init-containers do when there is no submittedDependencyPlugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The init-container can still fetch files from hdfs, http, etc.
|
@mccheah Thanks for the summary in the description. It really helped review the PR. |
|
rerun unit tests please |
…es' into init-containers-on-executors-with-remote-files
|
|
||
| private[spark] val INIT_CONTAINER_REMOTE_FILES = | ||
| ConfigBuilder("spark.kubernetes.initcontainer.remoteFiles") | ||
| .doc("Comma-separated list of file URIs to download in the init-container. This is inferred" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inferred -> calculated
|
|
||
| private[spark] val INIT_CONTAINER_REMOTE_JARS = | ||
| ConfigBuilder("spark.kubernetes.initcontainer.remoteJars") | ||
| .doc("Comma-separated list of jar URIs to download in the init-container. This is inferred" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inferred -> calculated
| ConfigBuilder("spark.kubernetes.mountdependencies.mountTimeout") | ||
| .doc("Timeout before aborting the attempt to download and unpack local dependencies from" + | ||
| " the dependency staging server when initializing the driver pod.") | ||
| " remote locations and the resource etaging server when initializing the driver and" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: etaging -> staging
| private[spark] val ENV_DRIVER_MEMORY = "SPARK_DRIVER_MEMORY" | ||
| private[spark] val ENV_UPLOADED_JARS_DIR = "SPARK_UPLOADED_JARS_DIR" | ||
| private[spark] val ENV_SUBMIT_EXTRA_CLASSPATH = "SPARK_SUBMIT_EXTRA_CLASSPATH" | ||
| private[spark] val ENV_EXECUTOR_EXTRA_CLASSPATH = "SPARK_SUBMIT_EXTRA_CLASSPATH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two are the same string value, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're set in different contexts so it's fine if they're the same.
| .provideSubmittedDependenciesSecretBuilder( | ||
| maybeSubmittedResourceIdentifiers.map(_.secrets())) | ||
| val maybeSubmittedDependenciesSecret = maybeSecretBuilder.map(_.buildInitContainerSecret()) | ||
| val initContainerConfigMapBuilder = initContainerComponentsProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline initContainerConfigMapBuilder if you're not using it later -- fewer identifiers is less conceptual overhead
| * remote dependencies. The config map includes the remote jars and files to download, | ||
| * as well as details to fetch files from a resource staging server, if applicable. | ||
| */ | ||
| def buildInitContainerConfigMap(): SingleKeyConfigMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just build ? I don't think this trait is implemented anywhere with multiple inheritance where the name build would conflict with another method
| val initContainerConfigMapBuilder = initContainerComponentsProvider | ||
| .provideInitContainerConfigMapBuilder(maybeSubmittedResourceIdentifiers.map(_.ids())) | ||
| val initContainerConfigMap = initContainerConfigMapBuilder.buildInitContainerConfigMap() | ||
| val initContainerBootstrap = initContainerComponentsProvider.provideInitContainerBootstrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline initContainerBootstrap
| sparkConf: SparkConf, | ||
| kubernetesAppId: String, | ||
| sparkJars: Seq[String], | ||
| sparkFiles: Seq[String]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't sparkJars and sparkFiles already in sparkConf? is this duplicate info that could be out of sync in the constructor parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the SparkConf but we don't want to calculate the Seq contents in two places, since these values are also used in Client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client only uses it for validation that duplicate file names are not present, however. It's unclear if this validation should be done elsewhere.
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! We might find more things down the way, but I think this is ready for merge.
We've got a full implementation of V2 submission with the staging server and driver/executor init containers in here, plus some unit test coverage (non-exhaustive) and an integration test to make sure the whole thing works.
I'm comfortable merging!
|
LGTM |
|
rerun unit tests please |
…es' into init-containers-on-executors-with-remote-files
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing those merge conflicts looks good, let's merge when this build passes.
cc @foxish for any last comments
…a init-container (#251) * Download remotely-located resources on driver startup. Use init-container in executors. * FIx owner reference slightly * Clean up config * Don't rely too heavily on conventions that can change * Fix flaky test * Tidy up file resolver * Whitespace arrangement * Indentation change * Fix more indentation * Consolidate init container component providers * Minor method signature and comment changes * Rename class for consistency * Resolve conflicts * Fix flaky test * Add some tests and some refactoring. * Make naming consistent for Staged -> Submitted * Add unit test for the submission client. * Refine expectations * Rename variables and fix typos * Address more comments. Remove redundant SingleKeyConfigMap. * Minor test adjustments. * add another test * Fix conflicts.
…-on-k8s#251) The recent org.apache.orc:orc-mapreduce:1.4.0 addition introduced a conflict that we must resolve in the BOM
…a init-container (apache-spark-on-k8s#251) * Download remotely-located resources on driver startup. Use init-container in executors. * FIx owner reference slightly * Clean up config * Don't rely too heavily on conventions that can change * Fix flaky test * Tidy up file resolver * Whitespace arrangement * Indentation change * Fix more indentation * Consolidate init container component providers * Minor method signature and comment changes * Rename class for consistency * Resolve conflicts * Fix flaky test * Add some tests and some refactoring. * Make naming consistent for Staged -> Submitted * Add unit test for the submission client. * Refine expectations * Rename variables and fix typos * Address more comments. Remove redundant SingleKeyConfigMap. * Minor test adjustments. * add another test * Fix conflicts.
Combines #240 and #249 while not including #246. This approach is different from #240 in that it uses a single init-container to fetch all dependencies at once.
This approach takes the various responsibilities of configuring the driver and executors to use init-containers to download dependencies. This approach breaks up the configuration of the init-container on the driver into multiple steps. The step that actually configures the pod is re-used for configuring the executor pod to use the same init-container to fetch dependencies as well.
There are numerous steps required for the init-container to behave properly. Previously, all of the steps were handled in the same class - the
MountedDependencyUploader. However, it became difficult to extend the existing class to be both reusable for the executors as well as for it to now not only handle uploads from the user's local machine, but also to configure the init-container to download files from remote locations.Therefore this PR aims to take a much more modular approach that enumerates the steps in specific classes. The steps that are taken to set up the init-container as well as use the submitted dependencies in the driver are as follows:
spark.jarsandspark.filestaking into account that submitted files will be downloaded from the resource staging server. This differs slightly from step 5 as remote locations are preserved in the resulting configuration.All of these steps are executed by the submission client, starting from here. Tests of the
Clientshould mock out each of the steps accordingly. Originally, each of the step classes had their own associated provider class to assist testing, but in the end the boilerplate was reduced by using a single provider class that configures everything related to the init-containers. The single provider class could also ensure that steps that needed to be configured with consistent values could be guaranteed to be wired with the same values from the same location. For example, the localized files resolution and the init-container config map must both reference the same download path for the jars, so the components provider can accordingly use the same value when configuring both of these steps.Finally, the executors also are configured to use init-containers to download dependencies. The bulk of the code that's used here is shared with the logic that configures the driver.