-
Notifications
You must be signed in to change notification settings - Fork 117
Use a secret to mount small files in driver and executors. #437
Conversation
Allows bypassing the resource staging server in a few scenarios.
|
Still missing unit tests on the submission client side. |
| smallFilesSecretMountPath: String, | ||
| mountSmallFilesBootstrap: MountSmallFilesBootstrap) extends DriverConfigurationStep { | ||
|
|
||
| private val MAX_SECRET_BUNDLE_SIZE_BYTES = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a config option also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had discussed enforcing a maximum size for files transmitted in this way, and falling back to RSS for anything larger. The concern was that without such limits, it could hit kube performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forcing the value here is probably better lest anyone set the value to something bigger than it should be erroneously.
| if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \ | ||
| if ! [ -z ${SPARK_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \ | ||
| if ! [ -z ${SPARK_MOUNTED_FILES_DIR} ]; then cp -R "$SPARK_MOUNTED_FILES_DIR/." .; fi && \ | ||
| if ! [ -z ${SPARK_MOUNTED_FILES_FROM_SECRET_DIR} ]; then cp -R "$SPARK_MOUNTED_FILES_FROM_SECRET_DIR/." .; fi && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this cp overwrite other things in the directory? possibly we should make this whatever the opposite of force is, so fail on name collisions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already check in the submission client that one doesn't add multiple files with the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking a conflict with other things, say a file from Spark core, a jar name, if the log4j / spark-metrics / fair scheduler files are baked in, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's still problematic if they add a file named "jars" for example, which would overwrite the jars directory. So we should add the -n option. Similarly for the line above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the -n flag isn't supported by all Linux distributions... such as the one that is serving as our base image in all of the images we provide. As far as I can tell there is no way to immediately error on an overwrite, based on what the options are:
BusyBox v1.26.2 (2017-08-03 13:08:12 GMT) multi-call binary.
Usage: cp [OPTIONS] SOURCE... DEST
Copy SOURCE(s) to DEST
-a Same as -dpR
-R,-r Recurse
-d,-P Preserve symlinks (default if -R)
-L Follow all symlinks
-H Follow symlinks on command line
-p Preserve file attributes if possible
-f Overwrite
-i Prompt before overwrite
-l,-s Create (sym)links
-u Copy only newer files
We can use the -i flag but that forces a prompt instead of crashing immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other options:
- Make the working directory for the base image be somewhere other than the SPARK_HOME directory, such that it's guaranteed to just be an empty directory and thus adding contents to it won't overwrite anything.
- Make the files in SPARK_HOME immutable
- Have a script do the copying such that it intelligently will throw an error if it attempts to overwrite something
Of these three options, I like (1) the most, though (3) is acceptable. (2) is very difficult to achieve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have enough control to remove write-privs on all existing files? that ought to cause an error on an attempted overwrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty hard to do actually. I think regardless of what the permissions are set in the image, the file system at runtime is set up such that root can always overwrite and modify the data in the sandbox. This is from limited experimentation and trying to use things like chmod and chattr, none of which seemed to accomplish what we're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok well this is not a new problem (existed before for SPARK_MOUNTED_FILES_DIR) so let's deal with this separately and keep using the existing practice for now
| // Then, indicate to the outer block that the init-container should not handle | ||
| // those local files simply by filtering them out. | ||
| val sparkFilesWithoutLocal = KubernetesFileUtils.getNonSubmitterLocalFiles(sparkFiles) | ||
| val smallFilesSecretName = s"$kubernetesAppId-submitted-files" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth wrapping that variable to s"${kubernetesAppId}-submitted-files" for clarity on what's expanded
|
|
||
| private def areAnyFilesNonContainerLocal(files: Seq[String]): Boolean = { | ||
| files.exists { uri => | ||
| Option(Utils.resolveURI(uri).getScheme).getOrElse("file") != "local" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't there helper methods somewhere for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might belong in KubernetesFileUtils but it has no use other than in here, so I just made it a private method for this.
| initContainerSecretMountPath) | ||
| } | ||
| // Only set up the bootstrap if they've provided both the config map key and the config map | ||
| // name. Note that we generally expect both to have been set from spark-submit V2, but for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this V2 comment stale now?
|
|
||
| override def configureDriver(driverSpec: KubernetesDriverSpec): KubernetesDriverSpec = { | ||
| val localFiles = KubernetesFileUtils.getOnlySubmitterLocalFiles(sparkFiles).map(new File(_)) | ||
| val totalSizeBytes = localFiles.map(_.length()).sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per the weekly sync discussion this should be size when base64 encoded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we decided the opposite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry, missed that. It's a small ratio difference anyway so doesn't matter much either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the reasoning here is that it's clearer to message "Your files are X bytes, max is Y bytes" as opposed to "Your base64-encoded files are X bytes, max base-64 encoded is Y bytes". Users can directly check sizes of files far more easily than base64-encoded files.
| .build() | ||
| } | ||
| }.getOrElse(executorPod) | ||
| val (withMaybeSmallFIlesMountedPod, withMaybeSmallFilesMountedContainer) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: FIles -> Files
| EXECUTOR_SUBMITTED_SMALL_FILES_SECRET_MOUNT_PATH) === | ||
| Some(MOUNTED_SMALL_FILES_SECRET_MOUNT_PATH)) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want another test that the 10kb threshold is applied properly I think
| ENV SPARK_HOME /opt/spark | ||
|
|
||
| WORKDIR /opt/spark | ||
| WORKDIR /opt/spark/work-dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a sense this is a backcompat break for people baking their own images that expected /opt/spark to be the cwd.
trading off this break for the ability to make a file named jars that conflicts with the folder with the same name -- is that the right tradeoff?
this won't affect any of the application images we make I think, so maybe not bad in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, I'd rather have this than have the potential for overwriting data or getting surprising results in general.
| .withNewMetadata() | ||
| .withName(smallFilesSecretName) | ||
| .endMetadata() | ||
| .withData(localFileBase64Contents.asJava) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an OwnerReference here so the secret is owned by the driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's returned out of this method as a otherKubernetesResources which has all the owner references created in Client.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
| .withName(smallFilesSecretName) | ||
| .endMetadata() | ||
| .withData(localFileBase64Contents.asJava) | ||
| .build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the SPARK_APP_ID_LABEL label to the secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we haven't added that as an ID label to other secrets created in the submission process (I'm looking at the SecretBuilder call in DriverKubernetesCredentialsStep)
Do you think it should be there? My first inclination is to match existing practice in this PR, and then if we decide to add labels to secrets to do that in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
| .withNewMetadata() | ||
| .withName(smallFilesSecretName) | ||
| .endMetadata() | ||
| .withData(localFileBase64Contents.asJava) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's returned out of this method as a otherKubernetesResources which has all the owner references created in Client.scala
| .withName(smallFilesSecretName) | ||
| .endMetadata() | ||
| .withData(localFileBase64Contents.asJava) | ||
| .build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we haven't added that as an ID label to other secrets created in the submission process (I'm looking at the SecretBuilder call in DriverKubernetesCredentialsStep)
Do you think it should be there? My first inclination is to match existing practice in this PR, and then if we decide to add labels to secrets to do that in a separate PR
| } | ||
|
|
||
| test("Using large files should throw an exception.") { | ||
| val largeTempFileContents = BaseEncoding.base64().encode(new Array[Byte](20000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this 10241 (limit +1) so if we ever increase the limit in the future this test will fail
|
Is this good to merge? |
|
LGTM |
|
LGTM. |
…ark-on-k8s#437) * Use a secret to mount small files in driver and executors. Allows bypassing the resource staging server in a few scenarios. * Fix scalstyle * Address comments and add tests. * Lightly brush up formatting. * Make the working directory empty so that added files don't clobber existing binaries. * Address comments. * Drop testing file size to N+1 of the limit
Allows bypassing the resource staging server in a few scenarios.
Closes #393.