Added files should be in the working directories. #294

mccheah · 2017-05-23T21:56:21Z

Closes #292.

The implementation here unfortunately requires more logic in the Dockerfile's command, which further increases the surface area of the API between the Docker image and the submission client. Nevertheless this seems like the best approach among the sub-optimal alternatives. For example, we could have made the init-container copy the files into its working directory, but this introduces an opaque contract where the init-container has to have the same working directory as the driver; since the working directories are defined by the Docker images, custom drivers may not be able to satisfy this guarantee.

The approach given here at least ensures that if files are added by any means to the given directory, that those files are at least consistent in being added to its own working directory, regardless of the state of the component that mounted those files in the first place.

ash211

Looks good. It seems like we should be able to have the init container write files directly into the working directory of the driver, as configured via an environment variable on the init container. The contract then is that the init container must download the jars and place them in the dir specified by the envvar.

Do you feel that's an opaque contract?

ash211 · 2017-05-23T23:05:17Z

resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/driver/Dockerfile

    if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
    if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
    if ! [ -z ${SPARK_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
+    if ! [ -z ${SPARK_MOUNTED_FILES_DIR} ]; then cp -R "$SPARK_MOUNTED_FILES_DIR/." .; fi && \


maybe can use ln instead of cp to reduce disk churn?

We don't want to symbolic link the directory itself, but we can create a link to each file in the directory?

Right, could do some unix-foo to make the files appear in the other directory. Something like:

if ! [ -z ${SPARK_MOUNTED_FILES_DIR} ]; then find "$SPARK_MOUNTED_FILES_DIR/" -type f -exec ln -s {} . \; ; fi && \

This is something we can do after this PR though -- would you rather revisit later if profiling reveals the copy is slow? It should only affect large files. If we push, I can file an issue to remind us to come back to this

Yeah - spark.files is meant to be for small things, usually.

mccheah · 2017-05-23T23:12:50Z

The problem is the submission client doesn't know what the working directory of the driver is. In our current model we've set the working directory in the Dockerfile and the submission client does not know many details of it.

ash211 · 2017-05-23T23:19:47Z

Ah, I thought /opt/spark was available somewhere outside the docker container but guess not.

Given that, I think this is the right approach.

ash211 · 2017-05-23T23:23:18Z

Planning to merge when integration test passes

* Added files should be in the working directories. * Revert unintentional changes * Fix test

…s#294) * Added files should be in the working directories. * Revert unintentional changes * Fix test

mccheah added 3 commits May 23, 2017 14:51

Added files should be in the working directories.

a46248d

Revert unintentional changes

b75ab72

Fix test

577dd99

ash211 reviewed May 23, 2017

View reviewed changes

ash211 approved these changes May 23, 2017

View reviewed changes

ash211 merged commit 56414f9 into branch-2.1-kubernetes May 23, 2017

ash211 deleted the add-files-to-working-directory branch May 23, 2017 23:38

foxish pushed a commit that referenced this pull request Jul 24, 2017

Added files should be in the working directories. (#294)

27b79a2

* Added files should be in the working directories. * Revert unintentional changes * Fix test

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019

Use latest maven patch version (apache-spark-on-k8s#294)

e4e1294

puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019

Added files should be in the working directories. (apache-spark-on-k8…

bdabbf9

…s#294) * Added files should be in the working directories. * Revert unintentional changes * Fix test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added files should be in the working directories. #294

Added files should be in the working directories. #294

Uh oh!

mccheah commented May 23, 2017

Uh oh!

ash211 left a comment

Uh oh!

ash211 May 23, 2017

Uh oh!

mccheah May 23, 2017

Uh oh!

ash211 May 23, 2017

Uh oh!

mccheah May 23, 2017

Uh oh!

ash211 May 23, 2017

Uh oh!

mccheah commented May 23, 2017

Uh oh!

ash211 commented May 23, 2017

Uh oh!

ash211 commented May 23, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added files should be in the working directories. #294

Added files should be in the working directories. #294

Uh oh!

Conversation

mccheah commented May 23, 2017

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

ash211 May 23, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah May 23, 2017

Choose a reason for hiding this comment

Uh oh!

ash211 May 23, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah May 23, 2017

Choose a reason for hiding this comment

Uh oh!

ash211 May 23, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah commented May 23, 2017

Uh oh!

ash211 commented May 23, 2017

Uh oh!

ash211 commented May 23, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants