Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@mccheah
Copy link

@mccheah mccheah commented May 23, 2017

Closes #292.

The implementation here unfortunately requires more logic in the Dockerfile's command, which further increases the surface area of the API between the Docker image and the submission client. Nevertheless this seems like the best approach among the sub-optimal alternatives. For example, we could have made the init-container copy the files into its working directory, but this introduces an opaque contract where the init-container has to have the same working directory as the driver; since the working directories are defined by the Docker images, custom drivers may not be able to satisfy this guarantee.

The approach given here at least ensures that if files are added by any means to the given directory, that those files are at least consistent in being added to its own working directory, regardless of the state of the component that mounted those files in the first place.

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. It seems like we should be able to have the init container write files directly into the working directory of the driver, as configured via an environment variable on the init container. The contract then is that the init container must download the jars and place them in the dir specified by the envvar.

Do you feel that's an opaque contract?

if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
if ! [ -z ${SPARK_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
if ! [ -z ${SPARK_MOUNTED_FILES_DIR} ]; then cp -R "$SPARK_MOUNTED_FILES_DIR/." .; fi && \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe can use ln instead of cp to reduce disk churn?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to symbolic link the directory itself, but we can create a link to each file in the directory?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, could do some unix-foo to make the files appear in the other directory. Something like:

if ! [ -z ${SPARK_MOUNTED_FILES_DIR} ]; then find "$SPARK_MOUNTED_FILES_DIR/" -type f -exec ln -s {} . \; ; fi && \

This is something we can do after this PR though -- would you rather revisit later if profiling reveals the copy is slow? It should only affect large files. If we push, I can file an issue to remind us to come back to this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - spark.files is meant to be for small things, usually.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mccheah
Copy link
Author

mccheah commented May 23, 2017

The problem is the submission client doesn't know what the working directory of the driver is. In our current model we've set the working directory in the Dockerfile and the submission client does not know many details of it.

@ash211
Copy link

ash211 commented May 23, 2017

Ah, I thought /opt/spark was available somewhere outside the docker container but guess not.

Given that, I think this is the right approach.

@ash211
Copy link

ash211 commented May 23, 2017

Planning to merge when integration test passes

@ash211 ash211 merged commit 56414f9 into branch-2.1-kubernetes May 23, 2017
@ash211 ash211 deleted the add-files-to-working-directory branch May 23, 2017 23:38
foxish pushed a commit that referenced this pull request Jul 24, 2017
* Added files should be in the working directories.

* Revert unintentional changes

* Fix test
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
…s#294)

* Added files should be in the working directories.

* Revert unintentional changes

* Fix test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants