Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Conversation

@mccheah
Copy link

@mccheah mccheah commented Feb 11, 2017

This mirrors similarly to what YARN and Mesos expects.

This mirrors similarly to what YARN and Mesos expects.
…te-incremental' into change-add-files-format
@mccheah mccheah changed the title Change the API contract for uploading local jars. Change the API contract for uploading local files Feb 11, 2017
@mccheah
Copy link
Author

mccheah commented Feb 11, 2017

Use #104 instead.

EDIT: actually using this PR after all

@mccheah mccheah closed this Feb 11, 2017
@mccheah mccheah deleted the change-add-files-format branch February 11, 2017 02:32
@mccheah mccheah restored the change-add-files-format branch February 11, 2017 02:32
@mccheah mccheah deleted the change-add-files-format branch February 11, 2017 02:32
@ash211 ash211 restored the change-add-files-format branch February 16, 2017 01:27
@ash211 ash211 reopened this Feb 16, 2017
Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great @mccheah ! The main question I have is about whether we need to distinguish between the driver container and the executor container. Right now I think the code requires jars/files are baked into the same place in both driver and executor images, and doesn't support baking jars/files into just the driver and using the existing http service to distribute them to executors.

section for details. Note that files specified with the `local` scheme should be added to the container image of both
the driver and the executors. Files without a scheme or with the scheme `file://` are treated as being on the disk of
the submitting machine, and are uploaded to the driver running in Kubernetes before launching the application.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you get rid of the examples? I think it would still be useful to have examples for:

  • everything located on driver container:
    • app jar
    • keystore
    • additional files (log4j ?)
  • additional library jars located on executors' containers
    • to prevent re-distributing large jars over the driver's http service

link to advanced dependency management is great though!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples would merely be re-hashing what is already stated in the advanced dependency management system. The additional library jars on executor containers are just files specified with local:// and those files also have to be on the driver container... which is also covered in the advanced dependency management section. Similarly with added local files.

The SSL configuration is interesting however - I don't think the docs have been updated to reflect that.

uris.filter(uri => schemeFilter(Option(Utils.resolveURI(uri).getScheme).getOrElse("file")))
}

def getOnlyContainerLocalOrRemoteFiles(uris: Iterable[String]): Iterable[String] = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNonSubmitterLocalFiles ? It more accurately describes the applied filter

val submitterLocalJars = KubernetesFileUtils.getOnlySubmitterLocalFiles(sparkJars)
(submitterLocalFiles ++ submitterLocalJars).foreach { file =>
if (!new File(Utils.resolveURI(file).getPath).isFile) {
throw new SparkException(s"File $file is not a file or is a directory.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "does not exist" to this list of things that could be wrong. Does this also mean that you can't submit a file by symlink?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh yeah "not a file" should be "does not exist".

case "file" =>
val appFile = new File(mainResourceUri.getPath)
if (!appFile.isFile) {
throw new IllegalStateException("Provided local file path does not exist" +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is now redundant with the check in line 87? remove this one since the other one is earlier and failing fast is good

(isProvidedKeyStoreLocal, Option.apply(keyStoreURI.getPath))
(KubernetesFileUtils.isUriLocalFile(keyStore),
Option.apply(Utils.resolveURI(keyStore).getPath))
}).getOrElse((true, Option.empty[String]))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be false? if no key store is given not sure it makes sense to have isLocalKeyStore=true

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ignore the property anyways if no key store is given.

val resourceFile = new File(tempDir, resourceName)
val resourceFilePath = resourceFile.getAbsolutePath
if (resourceFile.createNewFile()) {
val resourceContentsBytes = Base64.decodeBase64(resourceContentsBase64)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this holds another copy of the file in memory -- can we unBase64 and stream straight to disk without holding the bytes array?

// ought to have in the spark.jars property. The two may be different because for non-local
// dependencies, we have to fetch the resource (if it is not "local") but still want to use
// the full URI in spark.jars.
private def resolvedAppResource(appResource: AppResource, tempDir: File): (String, String) = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description is big enough for javadoc and especially to clarify what the (String, String) return value means. First is local absolute path and second is the value to use in spark.jars on driver launch?

s" to $resourceFilePath")
}
case ContainerAppResource(resource) => resource
case ContainerAppResource(resource) => (Utils.resolveURI(resource).getPath, resource)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we distinguishing between resources located on just the driver container vs on both driver and executor containers? In the former you need to convert to a file:/// path so the Spark driver will transfer it to executors via http and in the latter you don't do any conversion so executors read from their in-container paths.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files with the scheme local are expected to be pre-mounted everywhere. I believe the other deploy modes do not allow for a mode of operation where files can be shipped only to the driver but are expected to also be pre-mounted on the executors.

Copy link
Author

@mccheah mccheah Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words - there is no case where a container resource is given a file that is pre-mounted on one container but not mounted on the other containers. Such resources can only be specified by spark.driver.extraClassPath and spark.executor.extraClassPath.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this should be fine to not support that driver-only case. If you can bake something ahead of time into the driver you ought to be able to bake it into the executor image too. Let's not differentiate between just driver or just executor resources.

"--jars", HELPER_JAR,
"--files", TEST_EXISTENCE_FILE.getAbsolutePath,
"--class", FILE_EXISTENCE_MAIN_CLASS,
"--conf", "spark.ui.enabled=false",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a test for:

  • --jars with a local:/// path
  • --files with a local:/// path

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it the more I'm not sure what having --files with a local:// path even means. If the file is local and the user wants access to it... if they knew the path to give in the --files parameter itself, would they not also be able to just use that path directly from the containers? So I don't think testing --files with local paths is altogether too relevant, really.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also not clear if we should be copying those files to the working directory.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point that --files with local:/// doesn't do anything more than copy the files to the working directory. Maybe we should explicitly block local:// URIs from --files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't block it as the other cluster managers probably support it, but there's probably no need for a test there.


### Setting Up SSL For Submitting the Driver

When submitting to Kubernetes, a pod is started for the driver, and the pod starts an HTTP server. This HTTP server
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update the container: references to local: below in these docs (lines 77 and 79)

@mccheah
Copy link
Author

mccheah commented Feb 16, 2017

Right now I think the code requires jars/files are baked into the same place in both driver and executor images, and doesn't support baking jars/files into just the driver and using the existing http service to distribute them to executors.

This is correct, and mirrors what is expected on the other runtime modes.

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a final sanity check on my cluster then will merge.

@ash211
Copy link

ash211 commented Feb 16, 2017

Looks all good, merging!

@ash211 ash211 merged commit 51df502 into k8s-support-alternate-incremental Feb 16, 2017
@ash211 ash211 deleted the change-add-files-format branch February 16, 2017 23:28
ash211 pushed a commit that referenced this pull request Mar 8, 2017
* Change the API contract for uploading local jars.

This mirrors similarly to what YARN and Mesos expects.

* Address comments

* Fix test
foxish pushed a commit that referenced this pull request Jul 24, 2017
* Change the API contract for uploading local jars.

This mirrors similarly to what YARN and Mesos expects.

* Address comments

* Fix test
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019
…s#107)

* Change the API contract for uploading local jars.

This mirrors similarly to what YARN and Mesos expects.

* Address comments

* Fix test
ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
…s#107)

* Change the API contract for uploading local jars.

This mirrors similarly to what YARN and Mesos expects.

* Address comments

* Fix test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants