[WIP] Use spark.jars and spark.files instead of k8s-specific config #104

ash211 · 2017-02-10T07:00:46Z

Fixes #92

This brings spark.jars and spark.files in line with the behavior of YARN and
other cluster managers.

Specifically now, the following schemes are supported:

local:// is a container-local file assumed to be present on both driver and
executor containers
container:// is a synonym for local://
file:// is a submitter-local file that's uploaded to the driver
a no-scheme path is treated as if it had the file:// scheme

Filenames of spark.files are required to be unique since they are all placed
in the current working directory of the driver and executors. spark.jars does
not have this restriction -- they are given a unique suffix and placed in a
separate folder from the current working directory and added to the driver
classpath.

This brings spark.jars and spark.files in line with the behavior of YARN and other cluster managers. Specifically now, the following schemes are supported: - local:// is a container-local file assumed to be present on both driver and executor containers - container:// is a synonym for local:// - file:// is a submitter-local file that's uploaded to the driver - a no-scheme path is treated as if it had the file:// scheme Filenames of spark.files are required to be unique since they are all placed in the current working directory of the driver and executors. spark.jars does not have this restriction -- they are given a unique suffix and placed in a separate folder from the current working directory and added to the driver classpath.

mccheah · 2017-02-11T02:20:48Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/KubernetesSparkRestServer.scala

-                val originalFiles = sparkProperties.get("spark.files")
+                val nonUploadedFiles = sparkProperties.get("spark.files")
                  .map(_.split(","))
+                  .map(_.filter(Utils.resolveURI(_).getScheme match {


This is a common enough pattern that we could think about separating this out to a separate class.

Also, some other places in core Spark have used this paradigm:

Option(Utils.resolveURI(file).getScheme).getOrElse("file") match ...

Using the above we can shorthand a lot of things and avoid the case-match entirely. For example:

Option(Utils.resolveURI(file).getScheme).getOrElse("file") != "file"

mccheah · 2017-02-11T02:22:25Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/KubernetesSparkRestServer.scala

+                  }))
                  .getOrElse(Array.empty[String])
-                val resolvedJars = writtenJars ++ originalJars ++ Array(appResourcePath)
+                val resolvedJars = writtenJars ++ nonUploadedJars ++ Array(appResourcePath)


Here we're missing adding jars with the scheme local to the driver's classpath. Note that while we want to add said jars to the classpath using just the raw path, they should still be put in spark.jars with the full URI so that executors pick them up from their local disks, as opposed to having the driver upload them.

mccheah · 2017-02-11T02:23:08Z

Update docs/running-on-kubernetes.md also!

mccheah · 2017-02-11T02:27:13Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

  }

-  private def compressFiles(maybeFilePaths: Option[String]): Option[TarGzippedData] = {
+  private def compressUploadableFiles(maybeFilePaths: Option[String]): Option[TarGzippedData] = {


I'm not entirely sure if it's worth keeping TarGzippedData fields as Options anymore. Sure, we save a few bytes of extra data to upload if there aren't any jars, but it's an extra layer of indirection that has to be semantically understood.

mccheah · 2017-02-11T02:27:57Z

...rnetes/core/src/main/scala/org/apache/spark/deploy/rest/KubernetesRestProtocolMessages.scala

  serverSparkVersion = SPARK_VERSION
 }

+object AppResource {


Mark the object as private[spark]

mccheah · 2017-02-11T02:31:40Z

I don't think we should support both local and container, simply because the latter is specific to Kubernetes but has the exact same semantics as the former. We should just use local.

ash211 · 2017-02-16T01:29:04Z

After spending some time comparing this implementation with the one in #107 I've decided that one is better (covers several things that this one did not, including docs, SparkSubmit option updates, and remapping paths in those two options).

Closing this PR and continuing work on the other. Sorry for the double-review folks!

…nto-palantir-spark Bring first version of apache-spark-on-k8s into Palantir Spark

mccheah mentioned this pull request Feb 11, 2017

Change the API contract for uploading local files #107

Merged

mccheah reviewed Feb 11, 2017

View reviewed changes

ash211 closed this Feb 16, 2017

ash211 deleted the remove-nonstandard-upload-jars branch February 16, 2017 01:29

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 25, 2019

Merge pull request apache-spark-on-k8s#104 from palantir/rebase-k8s-o…

e99a8ba

…nto-palantir-spark Bring first version of apache-spark-on-k8s into Palantir Spark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Use spark.jars and spark.files instead of k8s-specific config #104

[WIP] Use spark.jars and spark.files instead of k8s-specific config #104

Uh oh!

ash211 commented Feb 10, 2017

Uh oh!

mccheah Feb 11, 2017

Uh oh!

mccheah Feb 11, 2017 •

edited

Loading

Uh oh!

mccheah Feb 11, 2017

Uh oh!

mccheah commented Feb 11, 2017 •

edited

Loading

Uh oh!

mccheah Feb 11, 2017

Uh oh!

mccheah Feb 11, 2017

Uh oh!

mccheah commented Feb 11, 2017 •

edited

Loading

Uh oh!

ash211 commented Feb 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] Use spark.jars and spark.files instead of k8s-specific config #104

[WIP] Use spark.jars and spark.files instead of k8s-specific config #104

Uh oh!

Conversation

ash211 commented Feb 10, 2017

Uh oh!

mccheah Feb 11, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah Feb 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah Feb 11, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah commented Feb 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mccheah Feb 11, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah Feb 11, 2017

Choose a reason for hiding this comment

Uh oh!

mccheah commented Feb 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ash211 commented Feb 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mccheah Feb 11, 2017 •

edited

Loading

mccheah commented Feb 11, 2017 •

edited

Loading

mccheah commented Feb 11, 2017 •

edited

Loading