Extract constants and config into separate file. Launch => Submit. #65

mccheah · 2017-01-31T00:32:05Z

Closes #47

mccheah · 2017-01-31T00:35:40Z

.../scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterSchedulerBackend.scala

+        .done())
+    } catch {
+      case throwable: Throwable =>
+        logError("Failed to allocate executor pod.", throwable)


In testing and running into problems I found that the exception here wasn't appearing in the logs; hence the change. There might be something in the calling code worth fixing instead of logging at this layer.

mccheah · 2017-01-31T00:38:14Z

...tes/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/KubernetesSparkRestApi.scala

  @Produces(Array(MediaType.APPLICATION_JSON))
  @Path("/create")
-  def create(request: KubernetesCreateSubmissionRequest): CreateSubmissionResponse
+  def submitApplication(request: KubernetesCreateSubmissionRequest): CreateSubmissionResponse


The path is still /create to be consistent with Standalone's REST submission server, although it's unclear if that's the correct precedent to follow.

iyanuobidele · 2017-01-31T01:04:43Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+  private val serviceAccount = sparkConf.get(KUBERNETES_SERVICE_ACCOUNT_NAME)
+  private val customLabels = sparkConf.get(KUBERNETES_DRIVER_LABELS)

  private implicit val retryableExecutionContext = ExecutionContext


I know this is not part of this commit, but in the spirit of refactoring, which I see a lot of, perhaps, you can make use of the spark wrappers here ?

Like so:
private implicit val retryableExecutionContext = ExecutionContext.fromExecutorService( ThreadUtils.newDaemonSingleThreadExecutor("kubernetes-client-retryable-futures-%d"))

ash211 · 2017-01-31T00:50:22Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

-    "spark.kubernetes.driver.docker.image", s"spark-driver:$sparkVersion")
-  private val uploadedJars = sparkConf.getOption("spark.kubernetes.driver.uploads.jars")
+  private val kubernetesAppId = sparkConf
+    .get("spark.app.id", s"$appName-$launchTime").toLowerCase.replaceAll("\\.", "-")


this isn't an equivalent change -- I think we wanted to add the launch time even to a user-specified spark.app.id -- @foxish ?

There's some inconsistency between this app id and the application ID expected by KubernetesClusterSchedulerBackend before this change. We could propagate the kubernetesAppId to the Spark driver conf and use that in KubernetesClusterSchedulerBackend instead.

I think if we have something as similarly named as "app id" and "application ID" it should be the same thing everywhere. Rather than a kubernetes app id and a spark app id being really close in name but slightly different.

Does that mean we need to use the with-timestamp version everywhere?

I also think the semantics of the spark.app.id conf are such that it should be expected to be uniquely identifiable. It also doesn't appear to be set by users often; spark.app.id isn't documented on the Spark configuration page. It's perhaps reasonable to instead force what spark.app.id is and don't give the user a say in the matter.

ash211 · 2017-01-31T00:51:55Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

        val driverKubernetesSelectors = (Map(
-            DRIVER_LAUNCHER_SELECTOR_LABEL -> driverLauncherSelectorValue,
+            SPARK_DRIVER_LABEL -> kubernetesAppId,
+            SPARK_APP_ID_LABEL -> kubernetesAppId,


why have both of these?

The first label indicates that this is a spark driver as opposed to a Spark executor. Thus when trying to match labels for the driver you want to pick up only the pods with the driver label (and not the executor label, for example)

On the other hand it's still useful to get labels to match everything (driver + executors) for a given app - which is what the second label is for.

Ah got it, just wasn't expecting to see much more than purely refactor changes here. This makes sense.

ash211 · 2017-01-31T01:00:09Z