Skip to content

Conversation

@mccheah
Copy link
Collaborator

@mccheah mccheah commented Dec 6, 2016

Alternative to #7 but a minimal variant. Only includes support for static resource allocation.

Some changes were made to the original fundamental approach to #7, in particular how the REST server is built. Now, the REST server uses the existing submission REST server infrastructure as its base, instead of creating new Jetty-based code from scratch. Additionally, uploading local dependencies now uses a separate field in order to deduplicate from specifying jars local to the docker image via "spark.jars" vs. uploading jars from the client machine. The appropraite APIs to expose these configuration knobs are open to discussion, especially since the user's main resource is currently being uploaded indiscriminately but one could foresee the user wanting to specify their main resource as a file local to the Docker image's disk.

Also the client arguments have changed to mostly use Spark properties. Like the YARN support, common configuration points should be able to be set by arguments to spark-submit, but translated to properties in SparkConf.


/**
* Represents a Maven Coordinate
*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix style here

def getFileContents(maybeFilePaths: Option[String]): Array[(String, String)] = {
maybeFilePaths
.map(_.split(",").map(filePath => {
val driverExtraClasspathFile = new File(filePath)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable names should be generalized

.withImage(executorDockerImage)
.withImagePullPolicy("IfNotPresent")
.withNewResources()
.addToRequests("memory", executorMemoryQuantity)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "cpus" request, aligned with spark.executor.cores

// Kubernetes-only options.
protected final String KUBERNETES_MASTER = "--kubernetes-master";
protected final String KUBERNETES_NAMESPACE = "--kubernetes-namespace";
protected final String KUBERNETES_UPLOAD_JARS = "--upload-jars";
Copy link

@ash211 ash211 Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a k8s-specific version of --jars ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The API here is quite difficult to define. One could merge this in with --jars, for instance. However, what if the user wants the files in --jars to be interpreted as files on the driver docker image? I'm not sure what the best API is for separating user-local jars vs. jars on the docker container's disk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if, in the case where the user expects jars on the docker image, they can also be assumed responsible for making sure their final CMD invocation knows where they are. IIUC, it would imply they put extra jars there, via defining their own images

protected final String KUBERNETES_MASTER = "--kubernetes-master";
protected final String KUBERNETES_NAMESPACE = "--kubernetes-namespace";
protected final String KUBERNETES_UPLOAD_JARS = "--upload-jars";
protected final String KUBERNETES_UPLOAD_DRIVER_EXTRA_CLASSPATH = "--upload-driver-extra-classpath";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a k8s-specific version of --driver-extra-classpath ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - similar to above. Might not include this, we could just say that driver extra classpath entries should be baked into the Docker image.


// Kubernetes-only options.
protected final String KUBERNETES_MASTER = "--kubernetes-master";
protected final String KUBERNETES_NAMESPACE = "--kubernetes-namespace";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems analogous to spark.yarn.queue which is also configurable via the --queue flag

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that was the inspiration here!

protected final String QUEUE = "--queue";

// Kubernetes-only options.
protected final String KUBERNETES_MASTER = "--kubernetes-master";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer a URL like k8s://host:port/ for referring to the k8s master. Would be curious to know how other k8s clients refer to the k8s master

.getOption("spark.kubernetes.master")
.getOrElse("Master must be provided in spark.kubernetes.master")

private val launchTime = System.currentTimeMillis
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.done())
}

override def doRequestTotalExecutors(requestedTotal: Int): Future[Boolean] = Future[Boolean] {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we including this in phase 1? I thought we were going to leave dynamic allocation turned off for the MVP.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also used for static allocation.

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

focused mostly on config here and whether the k8s-specific parameters should be included in spark-submit

@mccheah
Copy link
Collaborator Author

mccheah commented Feb 22, 2017

Closing in favor of the work being done on our fork: https://github.com/apache-spark-on-k8s/spark

@mccheah mccheah closed this Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants