Use a secret to mount small files in driver and executors. #437

mccheah · 2017-08-15T22:30:15Z

Allows bypassing the resource staging server in a few scenarios.

Closes #393.

Allows bypassing the resource staging server in a few scenarios.

mccheah · 2017-08-15T22:51:36Z

Still missing unit tests on the submission client side.

ash211 · 2017-08-15T23:09:59Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/MountSmallLocalFilesStep.scala

+    smallFilesSecretMountPath: String,
+    mountSmallFilesBootstrap: MountSmallFilesBootstrap) extends DriverConfigurationStep {
+
+  private val MAX_SECRET_BUNDLE_SIZE_BYTES = 10000


should this be a config option also?

We had discussed enforcing a maximum size for files transmitted in this way, and falling back to RSS for anything larger. The concern was that without such limits, it could hit kube performance.

Forcing the value here is probably better lest anyone set the value to something bigger than it should be erroneously.

ash211 · 2017-08-15T23:12:07Z

resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/executor/Dockerfile

    if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
    if ! [ -z ${SPARK_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
    if ! [ -z ${SPARK_MOUNTED_FILES_DIR} ]; then cp -R "$SPARK_MOUNTED_FILES_DIR/." .; fi && \
+    if ! [ -z ${SPARK_MOUNTED_FILES_FROM_SECRET_DIR} ]; then cp -R "$SPARK_MOUNTED_FILES_FROM_SECRET_DIR/." .; fi && \


could this cp overwrite other things in the directory? possibly we should make this whatever the opposite of force is, so fail on name collisions

We already check in the submission client that one doesn't add multiple files with the same name.

I'm thinking a conflict with other things, say a file from Spark core, a jar name, if the log4j / spark-metrics / fair scheduler files are baked in, etc

Actually that's still problematic if they add a file named "jars" for example, which would overwrite the jars directory. So we should add the -n option. Similarly for the line above.

Actually the -n flag isn't supported by all Linux distributions... such as the one that is serving as our base image in all of the images we provide. As far as I can tell there is no way to immediately error on an overwrite, based on what the options are:

BusyBox v1.26.2 (2017-08-03 13:08:12 GMT) multi-call binary. Usage: cp [OPTIONS] SOURCE... DEST Copy SOURCE(s) to DEST -a Same as -dpR -R,-r Recurse -d,-P Preserve symlinks (default if -R) -L Follow all symlinks -H Follow symlinks on command line -p Preserve file attributes if possible -f Overwrite -i Prompt before overwrite -l,-s Create (sym)links -u Copy only newer files

We can use the -i flag but that forces a prompt instead of crashing immediately.

Some other options:

Make the working directory for the base image be somewhere other than the SPARK_HOME directory, such that it's guaranteed to just be an empty directory and thus adding contents to it won't overwrite anything.

Make the files in SPARK_HOME immutable

Have a script do the copying such that it intelligently will throw an error if it attempts to overwrite something

Of these three options, I like (1) the most, though (3) is acceptable. (2) is very difficult to achieve.

do we have enough control to remove write-privs on all existing files? that ought to cause an error on an attempted overwrite

This is pretty hard to do actually. I think regardless of what the permissions are set in the image, the file system at runtime is set up such that root can always overwrite and modify the data in the sandbox. This is from limited experimentation and trying to use things like chmod and chattr, none of which seemed to accomplish what we're looking for.

Hmm ok well this is not a new problem (existed before for SPARK_MOUNTED_FILES_DIR) so let's deal with this separately and keep using the existing practice for now

ash211 · 2017-08-15T23:18:35Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/DriverConfigurationStepsOrchestrator.scala

+            // Then, indicate to the outer block that the init-container should not handle
+            // those local files simply by filtering them out.
+            val sparkFilesWithoutLocal = KubernetesFileUtils.getNonSubmitterLocalFiles(sparkFiles)
+            val smallFilesSecretName = s"$kubernetesAppId-submitted-files"


might be worth wrapping that variable to s"${kubernetesAppId}-submitted-files" for clarity on what's expanded

ash211 · 2017-08-15T23:19:15Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/DriverConfigurationStepsOrchestrator.scala

+
+  private def areAnyFilesNonContainerLocal(files: Seq[String]): Boolean = {
+    files.exists { uri =>
+      Option(Utils.resolveURI(uri).getScheme).getOrElse("file") != "local"


aren't there helper methods somewhere for this?

This might belong in KubernetesFileUtils but it has no use other than in here, so I just made it a private method for this.

ash211 · 2017-08-15T23:25:23Z

.../src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterManager.scala

        initContainerSecretMountPath)
    }
    // Only set up the bootstrap if they've provided both the config map key and the config map
    // name. Note that we generally expect both to have been set from spark-submit V2, but for


is this V2 comment stale now?

…isting binaries.

ash211 · 2017-08-17T09:25:54Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/MountSmallLocalFilesStep.scala

+
+  override def configureDriver(driverSpec: KubernetesDriverSpec): KubernetesDriverSpec = {
+    val localFiles = KubernetesFileUtils.getOnlySubmitterLocalFiles(sparkFiles).map(new File(_))
+    val totalSizeBytes = localFiles.map(_.length()).sum


per the weekly sync discussion this should be size when base64 encoded

Actually we decided the opposite.

oh sorry, missed that. It's a small ratio difference anyway so doesn't matter much either way

Yeah, the reasoning here is that it's clearer to message "Your files are X bytes, max is Y bytes" as opposed to "Your base64-encoded files are X bytes, max base-64 encoded is Y bytes". Users can directly check sizes of files far more easily than base64-encoded files.

ash211 · 2017-08-17T09:28:28Z

.../scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

          .build()
      }
    }.getOrElse(executorPod)
+    val (withMaybeSmallFIlesMountedPod, withMaybeSmallFilesMountedContainer) =


nit: FIles -> Files

ash211 · 2017-08-17T09:37:35Z

...ala/org/apache/spark/deploy/kubernetes/submit/submitsteps/MountSmallLocalFilesStepTest.scala

+        EXECUTOR_SUBMITTED_SMALL_FILES_SECRET_MOUNT_PATH) ===
+        Some(MOUNTED_SMALL_FILES_SECRET_MOUNT_PATH))
+  }
+


want another test that the 10kb threshold is applied properly I think

ash211 · 2017-08-17T09:41:49Z

resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/spark-base/Dockerfile

 ENV SPARK_HOME /opt/spark

-WORKDIR /opt/spark
+WORKDIR /opt/spark/work-dir


in a sense this is a backcompat break for people baking their own images that expected /opt/spark to be the cwd.

trading off this break for the ability to make a file named jars that conflicts with the folder with the same name -- is that the right tradeoff?

this won't affect any of the application images we make I think, so maybe not bad in general

I think this is fine, I'd rather have this than have the potential for overwriting data or getting surprising results in general.

liyinan926 · 2017-08-20T20:53:32Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/MountSmallLocalFilesStep.scala

+      .withNewMetadata()
+        .withName(smallFilesSecretName)
+        .endMetadata()
+      .withData(localFileBase64Contents.asJava)


Do we need an OwnerReference here so the secret is owned by the driver?

it's returned out of this method as a otherKubernetesResources which has all the owner references created in Client.scala

liyinan926 · 2017-08-20T20:59:30Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/MountSmallLocalFilesStep.scala

+        .withName(smallFilesSecretName)
+        .endMetadata()
+      .withData(localFileBase64Contents.asJava)
+      .build()


Should we add the SPARK_APP_ID_LABEL label to the secret?

we haven't added that as an ID label to other secrets created in the submission process (I'm looking at the SecretBuilder call in DriverKubernetesCredentialsStep)

Do you think it should be there? My first inclination is to match existing practice in this PR, and then if we decide to add labels to secrets to do that in a separate PR

ash211 · 2017-08-21T04:03:54Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/MountSmallLocalFilesStep.scala

+      .withNewMetadata()
+        .withName(smallFilesSecretName)
+        .endMetadata()
+      .withData(localFileBase64Contents.asJava)


it's returned out of this method as a otherKubernetesResources which has all the owner references created in Client.scala

ash211 · 2017-08-21T04:06:32Z

...n/scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/MountSmallLocalFilesStep.scala

+        .withName(smallFilesSecretName)
+        .endMetadata()
+      .withData(localFileBase64Contents.asJava)
+      .build()


we haven't added that as an ID label to other secrets created in the submission process (I'm looking at the SecretBuilder call in DriverKubernetesCredentialsStep)

Do you think it should be there? My first inclination is to match existing practice in this PR, and then if we decide to add labels to secrets to do that in a separate PR

ash211 · 2017-08-21T04:09:27Z

...ala/org/apache/spark/deploy/kubernetes/submit/submitsteps/MountSmallLocalFilesStepTest.scala

  }

+  test("Using large files should throw an exception.") {
+    val largeTempFileContents = BaseEncoding.base64().encode(new Array[Byte](20000))


make this 10241 (limit +1) so if we ever increase the limit in the future this test will fail

mccheah · 2017-08-21T21:17:14Z

Is this good to merge?

erikerlandson · 2017-08-21T21:39:42Z

LGTM

liyinan926 · 2017-08-21T21:43:55Z

LGTM.

…ark-on-k8s#437) * Use a secret to mount small files in driver and executors. Allows bypassing the resource staging server in a few scenarios. * Fix scalstyle * Address comments and add tests. * Lightly brush up formatting. * Make the working directory empty so that added files don't clobber existing binaries. * Address comments. * Drop testing file size to N+1 of the limit

mccheah added 2 commits August 15, 2017 15:26

Use a secret to mount small files in driver and executors.

4f8b61f

Allows bypassing the resource staging server in a few scenarios.

Fix scalstyle

f4265b6

ash211 reviewed Aug 15, 2017

View reviewed changes

mccheah added 3 commits August 15, 2017 17:21

Address comments and add tests.

e59a24e

Lightly brush up formatting.

929adc2

Make the working directory empty so that added files don't clobber ex…

d652baf

…isting binaries.

ash211 reviewed Aug 17, 2017

View reviewed changes

mccheah mentioned this pull request Aug 18, 2017

Support executor java options #445

Merged

Address comments.

c702b5c

liyinan926 reviewed Aug 20, 2017

View reviewed changes

Drop testing file size to N+1 of the limit

83f8feb

ash211 reviewed Aug 21, 2017

View reviewed changes

ash211 merged commit 455317d into branch-2.2-kubernetes Aug 21, 2017

ash211 deleted the add-small-files-with-secret branch August 21, 2017 21:46

mccheah mentioned this pull request Aug 22, 2017

Fail submission if submitter-local files are provided without resourc… #447

Merged

Use a secret to mount small files in driver and executors. #437

Use a secret to mount small files in driver and executors. #437

Uh oh!

Conversation

mccheah commented Aug 15, 2017

Uh oh!

mccheah commented Aug 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah commented Aug 21, 2017

Uh oh!

erikerlandson commented Aug 21, 2017

Uh oh!

liyinan926 commented Aug 21, 2017

Uh oh!