-
Notifications
You must be signed in to change notification settings - Fork 2
[SPARK-18278] Minimal support for submitting to Kubernetes. #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| /** | ||
| * Represents a Maven Coordinate | ||
| * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix style here
| def getFileContents(maybeFilePaths: Option[String]): Array[(String, String)] = { | ||
| maybeFilePaths | ||
| .map(_.split(",").map(filePath => { | ||
| val driverExtraClasspathFile = new File(filePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable names should be generalized
| .withImage(executorDockerImage) | ||
| .withImagePullPolicy("IfNotPresent") | ||
| .withNewResources() | ||
| .addToRequests("memory", executorMemoryQuantity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also "cpus" request, aligned with spark.executor.cores
| // Kubernetes-only options. | ||
| protected final String KUBERNETES_MASTER = "--kubernetes-master"; | ||
| protected final String KUBERNETES_NAMESPACE = "--kubernetes-namespace"; | ||
| protected final String KUBERNETES_UPLOAD_JARS = "--upload-jars"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a k8s-specific version of --jars ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The API here is quite difficult to define. One could merge this in with --jars, for instance. However, what if the user wants the files in --jars to be interpreted as files on the driver docker image? I'm not sure what the best API is for separating user-local jars vs. jars on the docker container's disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if, in the case where the user expects jars on the docker image, they can also be assumed responsible for making sure their final CMD invocation knows where they are. IIUC, it would imply they put extra jars there, via defining their own images
| protected final String KUBERNETES_MASTER = "--kubernetes-master"; | ||
| protected final String KUBERNETES_NAMESPACE = "--kubernetes-namespace"; | ||
| protected final String KUBERNETES_UPLOAD_JARS = "--upload-jars"; | ||
| protected final String KUBERNETES_UPLOAD_DRIVER_EXTRA_CLASSPATH = "--upload-driver-extra-classpath"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a k8s-specific version of --driver-extra-classpath ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - similar to above. Might not include this, we could just say that driver extra classpath entries should be baked into the Docker image.
|
|
||
| // Kubernetes-only options. | ||
| protected final String KUBERNETES_MASTER = "--kubernetes-master"; | ||
| protected final String KUBERNETES_NAMESPACE = "--kubernetes-namespace"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems analogous to spark.yarn.queue which is also configurable via the --queue flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that was the inspiration here!
| protected final String QUEUE = "--queue"; | ||
|
|
||
| // Kubernetes-only options. | ||
| protected final String KUBERNETES_MASTER = "--kubernetes-master"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer a URL like k8s://host:port/ for referring to the k8s master. Would be curious to know how other k8s clients refer to the k8s master
| .getOption("spark.kubernetes.master") | ||
| .getOrElse("Master must be provided in spark.kubernetes.master") | ||
|
|
||
| private val launchTime = System.currentTimeMillis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style guide recommends the use of nanotime. (https://github.com/databricks/scala-style-guide#misc_currentTimeMillis_vs_nanoTime)
| .done()) | ||
| } | ||
|
|
||
| override def doRequestTotalExecutors(requestedTotal: Int): Future[Boolean] = Future[Boolean] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we including this in phase 1? I thought we were going to leave dynamic allocation turned off for the MVP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also used for static allocation.
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
focused mostly on config here and whether the k8s-specific parameters should be included in spark-submit
|
Closing in favor of the work being done on our fork: https://github.com/apache-spark-on-k8s/spark |
Alternative to #7 but a minimal variant. Only includes support for static resource allocation.
Some changes were made to the original fundamental approach to #7, in particular how the REST server is built. Now, the REST server uses the existing submission REST server infrastructure as its base, instead of creating new Jetty-based code from scratch. Additionally, uploading local dependencies now uses a separate field in order to deduplicate from specifying jars local to the docker image via "spark.jars" vs. uploading jars from the client machine. The appropraite APIs to expose these configuration knobs are open to discussion, especially since the user's main resource is currently being uploaded indiscriminately but one could foresee the user wanting to specify their main resource as a file local to the Docker image's disk.
Also the client arguments have changed to mostly use Spark properties. Like the YARN support, common configuration points should be able to be set by arguments to spark-submit, but translated to properties in SparkConf.