Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit 95747bc

Browse files
mccheahfoxish
authored andcommitted
A number of small tweaks to the MVP. (#23)
* A number of small tweaks to the MVP. - Master protocol defaults to https if not specified - Removed upload driver extra classpath functionality - Added ability to specify main app resource with container:// URI - Updated docs to reflect all of the above - Add examples to Docker images, mostly for integration testing but could be useful for easily getting started without shipping anything * Add example to documentation.
1 parent f71abc1 commit 95747bc

File tree

11 files changed

+287
-142
lines changed

11 files changed

+287
-142
lines changed

docs/running-on-kubernetes.md

Lines changed: 24 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,12 @@ are set up as described above:
4242
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \
4343
examples/jars/spark_examples_2.11-2.2.0.jar
4444

45-
<!-- TODO master should default to https if no scheme is specified -->
4645
The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting
4746
`spark.master` in the application's configuration, must be a URL with the format `k8s://<api_server_url>`. Prefixing the
4847
master string with `k8s://` will cause the Spark application to launch on the Kubernetes cluster, with the API server
49-
being contacted at `api_server_url`. The HTTP protocol must also be specified.
48+
being contacted at `api_server_url`. If no HTTP protocol is specified in the URL, it defaults to `https`. For example,
49+
setting the master to `k8s://example.com:443` is equivalent to setting it to `k8s://https://example.com:443`, but to
50+
connect without SSL on a different port, the master would be set to `k8s://http://example.com:8443`.
5051

5152
Note that applications can currently only be executed in cluster mode, where the driver and its executors are running on
5253
the cluster.
@@ -58,17 +59,18 @@ disk of the submitter's machine. These two types of dependencies are specified v
5859
`spark-submit`:
5960

6061
* Local jars provided by specifying the `--jars` command line argument to `spark-submit`, or by setting `spark.jars` in
61-
the application's configuration, will be treated as jars that are located on the *disk of the driver Docker
62-
container*. This only applies to jar paths that do not specify a scheme or that have the scheme `file://`. Paths with
63-
other schemes are fetched from their appropriate locations.
62+
the application's configuration, will be treated as jars that are located on the *disk of the driver container*. This
63+
only applies to jar paths that do not specify a scheme or that have the scheme `file://`. Paths with other schemes are
64+
fetched from their appropriate locations.
6465
* Local jars provided by specifying the `--upload-jars` command line argument to `spark-submit`, or by setting
6566
`spark.kubernetes.driver.uploads.jars` in the application's configuration, will be treated as jars that are located on
6667
the *disk of the submitting machine*. These jars are uploaded to the driver docker container before executing the
6768
application.
68-
<!-- TODO support main resource bundled in the Docker image -->
6969
* A main application resource path that does not have a scheme or that has the scheme `file://` is assumed to be on the
7070
*disk of the submitting machine*. This resource is uploaded to the driver docker container before executing the
7171
application. A remote path can still be specified and the resource will be fetched from the appropriate location.
72+
* A main application resource path that has the scheme `container://` is assumed to be on the *disk of the driver
73+
container*.
7274

7375
In all of these cases, the jars are placed on the driver's classpath, and are also sent to the executors. Below are some
7476
examples of providing application dependencies.
@@ -78,8 +80,7 @@ To submit an application with both the main resource and two other jars living o
7880
bin/spark-submit \
7981
--deploy-mode cluster \
8082
--class com.example.applications.SampleApplication \
81-
--master k8s://https://192.168.99.100 \
82-
--kubernetes-namespace default \
83+
--master k8s://192.168.99.100 \
8384
--upload-jars /home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar \
8485
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest \
8586
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \
@@ -91,8 +92,7 @@ Note that since passing the jars through the `--upload-jars` command line argume
9192
bin/spark-submit \
9293
--deploy-mode cluster \
9394
--class com.example.applications.SampleApplication \
94-
--master k8s://https://192.168.99.100 \
95-
--kubernetes-namespace default \
95+
--master k8s://192.168.99.100 \
9696
--conf spark.kubernetes.driver.uploads.jars=/home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar \
9797
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest \
9898
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \
@@ -104,8 +104,7 @@ is located in the jar `/opt/spark-plugins/app-plugin.jar` on the docker image's
104104
bin/spark-submit \
105105
--deploy-mode cluster \
106106
--class com.example.applications.PluggableApplication \
107-
--master k8s://https://192.168.99.100 \
108-
--kubernetes-namespace default \
107+
--master k8s://192.168.99.100 \
109108
--jars /opt/spark-plugins/app-plugin.jar \
110109
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest \
111110
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \
@@ -117,13 +116,22 @@ Spark property, the above will behave identically to this command:
117116
bin/spark-submit \
118117
--deploy-mode cluster \
119118
--class com.example.applications.PluggableApplication \
120-
--master k8s://https://192.168.99.100 \
121-
--kubernetes-namespace default \
119+
--master k8s://192.168.99.100 \
122120
--conf spark.jars=file:///opt/spark-plugins/app-plugin.jar \
123121
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest \
124122
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \
125123
http://example.com:8080/applications/sparkpluggable/app.jar
126124
125+
To specify a main application resource that is in the Docker image, and if it has no other dependencies:
126+
127+
bin/spark-submit \
128+
--deploy-mode cluster \
129+
--class com.example.applications.PluggableApplication \
130+
--master k8s://192.168.99.100:8443 \
131+
--conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest \
132+
--conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest \
133+
container:///home/applications/examples/example.jar
134+
127135
### Spark Properties
128136

129137
Below are some other common properties that are specific to Kubernetes. Most of the other configurations are the same
@@ -133,10 +141,9 @@ from the other deployment modes. See the [configuration page](configuration.html
133141
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
134142
<tr>
135143
<td><code>spark.kubernetes.namespace</code></td>
136-
<!-- TODO set default to "default" -->
137-
<td>(none)</td>
144+
<td><code>default</code></td>
138145
<td>
139-
The namespace that will be used for running the driver and executor pods. Must be specified. When using
146+
The namespace that will be used for running the driver and executor pods. When using
140147
<code>spark-submit</code> in cluster mode, this can also be passed to <code>spark-submit</code> via the
141148
<code>--kubernetes-namespace</code> command line argument.
142149
</td>
@@ -196,14 +203,6 @@ from the other deployment modes. See the [configuration page](configuration.html
196203
mode. Refer to <a href="running-on-kubernetes.html#adding-other-jars">adding other jars</a> for more information.
197204
</td>
198205
</tr>
199-
<tr>
200-
<!-- TODO remove this functionality -->
201-
<td><code>spark.kubernetes.driver.uploads.driverExtraClasspath</code></td>
202-
<td>(none)</td>
203-
<td>
204-
Comma-separated list of jars to be sent to the driver only when submitting the application in cluster mode.
205-
</td>
206-
</tr>
207206
<tr>
208207
<td><code>spark.kubernetes.executor.memoryOverhead</code></td>
209208
<td>executorMemory * 0.10, with minimum of 384 </td>

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ import scala.concurrent.duration.DurationInt
3535
import scala.util.Success
3636

3737
import org.apache.spark.{SPARK_VERSION, SparkConf, SparkException}
38-
import org.apache.spark.deploy.rest.{AppResource, KubernetesCreateSubmissionRequest, RemoteAppResource, TarGzippedData, UploadedAppResource}
38+
import org.apache.spark.deploy.rest.{AppResource, ContainerAppResource, KubernetesCreateSubmissionRequest, RemoteAppResource, TarGzippedData, UploadedAppResource}
3939
import org.apache.spark.deploy.rest.kubernetes._
4040
import org.apache.spark.internal.Logging
4141
import org.apache.spark.util.Utils
@@ -47,13 +47,8 @@ private[spark] class Client(
4747
appArgs: Array[String]) extends Logging {
4848
import Client._
4949

50-
private val namespace = sparkConf.getOption("spark.kubernetes.namespace").getOrElse(
51-
throw new IllegalArgumentException("Namespace must be provided in spark.kubernetes.namespace"))
52-
private val rawMaster = sparkConf.get("spark.master")
53-
if (!rawMaster.startsWith("k8s://")) {
54-
throw new IllegalArgumentException("Master should be a URL with scheme k8s://")
55-
}
56-
private val master = rawMaster.replaceFirst("k8s://", "")
50+
private val namespace = sparkConf.get("spark.kubernetes.namespace", "default")
51+
private val master = resolveK8sMaster(sparkConf.get("spark.master"))
5752

5853
private val launchTime = System.currentTimeMillis
5954
private val appName = sparkConf.getOption("spark.app.name")
@@ -64,8 +59,6 @@ private[spark] class Client(
6459
private val driverLauncherSelectorValue = s"driver-launcher-$launchTime"
6560
private val driverDockerImage = sparkConf.get(
6661
"spark.kubernetes.driver.docker.image", s"spark-driver:$SPARK_VERSION")
67-
private val uploadedDriverExtraClasspath = sparkConf
68-
.getOption("spark.kubernetes.driver.uploads.driverExtraClasspath")
6962
private val uploadedJars = sparkConf.getOption("spark.kubernetes.driver.uploads.jars")
7063

7164
private val secretBase64String = {
@@ -112,12 +105,15 @@ private[spark] class Client(
112105
.withType("Opaque")
113106
.done()
114107
try {
115-
val resolvedSelectors = (Map(DRIVER_LAUNCHER_SELECTOR_LABEL -> driverLauncherSelectorValue)
108+
val resolvedSelectors = (Map(
109+
DRIVER_LAUNCHER_SELECTOR_LABEL -> driverLauncherSelectorValue,
110+
SPARK_APP_NAME_LABEL -> appName)
116111
++ parsedCustomLabels).asJava
117112
val (servicePorts, containerPorts) = configurePorts()
118113
val service = kubernetesClient.services().createNew()
119114
.withNewMetadata()
120115
.withName(kubernetesAppId)
116+
.withLabels(Map(SPARK_APP_NAME_LABEL -> appName).asJava)
121117
.endMetadata()
122118
.withNewSpec()
123119
.withSelector(resolvedSelectors)
@@ -355,18 +351,17 @@ private[spark] class Client(
355351
val fileBytes = Files.toByteArray(appFile)
356352
val fileBase64 = Base64.encodeBase64String(fileBytes)
357353
UploadedAppResource(resourceBase64Contents = fileBase64, name = appFile.getName)
354+
case "container" => ContainerAppResource(appResourceUri.getPath)
358355
case other => RemoteAppResource(other)
359356
}
360357

361-
val uploadDriverExtraClasspathBase64Contents = compressJars(uploadedDriverExtraClasspath)
362358
val uploadJarsBase64Contents = compressJars(uploadedJars)
363359
KubernetesCreateSubmissionRequest(
364360
appResource = resolvedAppResource,
365361
mainClass = mainClass,
366362
appArgs = appArgs,
367363
secret = secretBase64String,
368364
sparkProperties = sparkConf.getAll.toMap,
369-
uploadedDriverExtraClasspathBase64Contents = uploadDriverExtraClasspathBase64Contents,
370365
uploadedJarsBase64Contents = uploadJarsBase64Contents)
371366
}
372367

@@ -414,7 +409,7 @@ private[spark] class Client(
414409
}
415410
}
416411

417-
private object Client {
412+
private[spark] object Client extends Logging {
418413

419414
private val SUBMISSION_SERVER_SECRET_NAME = "spark-submission-server-secret"
420415
private val DRIVER_LAUNCHER_SELECTOR_LABEL = "driver-launcher-selector"
@@ -430,6 +425,7 @@ private object Client {
430425
private val SECURE_RANDOM = new SecureRandom()
431426
private val SPARK_SUBMISSION_SECRET_BASE_DIR = "/var/run/secrets/spark-submission"
432427
private val LAUNCH_TIMEOUT_SECONDS = 30
428+
private val SPARK_APP_NAME_LABEL = "spark-app-name"
433429

434430
def main(args: Array[String]): Unit = {
435431
require(args.length >= 2, s"Too few arguments. Usage: ${getClass.getName} <mainAppResource>" +
@@ -444,4 +440,20 @@ private object Client {
444440
sparkConf = sparkConf,
445441
appArgs = appArgs).run()
446442
}
443+
444+
def resolveK8sMaster(rawMasterString: String): String = {
445+
if (!rawMasterString.startsWith("k8s://")) {
446+
throw new IllegalArgumentException("Master URL should start with k8s:// in Kubernetes mode.")
447+
}
448+
val masterWithoutK8sPrefix = rawMasterString.replaceFirst("k8s://", "")
449+
if (masterWithoutK8sPrefix.startsWith("http://")
450+
|| masterWithoutK8sPrefix.startsWith("https://")) {
451+
masterWithoutK8sPrefix
452+
} else {
453+
val resolvedURL = s"https://$masterWithoutK8sPrefix"
454+
logDebug(s"No scheme specified for kubernetes master URL, so defaulting to https. Resolved" +
455+
s" URL is $resolvedURL")
456+
resolvedURL
457+
}
458+
}
447459
}

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/KubernetesRestProtocolMessages.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ case class KubernetesCreateSubmissionRequest(
2727
val appArgs: Array[String],
2828
val sparkProperties: Map[String, String],
2929
val secret: String,
30-
val uploadedDriverExtraClasspathBase64Contents: Option[TarGzippedData],
3130
val uploadedJarsBase64Contents: Option[TarGzippedData]) extends SubmitRestProtocolRequest {
3231
message = "create"
3332
clientSparkVersion = SPARK_VERSION
@@ -46,13 +45,16 @@ case class TarGzippedData(
4645
property = "type")
4746
@JsonSubTypes(value = Array(
4847
new JsonSubTypes.Type(value = classOf[UploadedAppResource], name = "UploadedAppResource"),
48+
new JsonSubTypes.Type(value = classOf[ContainerAppResource], name = "ContainerLocalAppResource"),
4949
new JsonSubTypes.Type(value = classOf[RemoteAppResource], name = "RemoteAppResource")))
5050
abstract class AppResource
5151

5252
case class UploadedAppResource(
5353
resourceBase64Contents: String,
5454
name: String = "spark-app-resource") extends AppResource
5555

56+
case class ContainerAppResource(resourcePath: String) extends AppResource
57+
5658
case class RemoteAppResource(resource: String) extends AppResource
5759

5860
class PingResponse extends SubmitRestProtocolResponse {

0 commit comments

Comments
 (0)