Support SSL configuration for the driver application submission #49

mccheah · 2017-01-25T23:25:49Z

No description provided.

The user can provide a keyStore to load onto the driver pod and the driver pod will use that keyStore to set up SSL on its server.

We don't need to persist these after the pod has them mounted and is running already.

ash211 · 2017-01-26T06:40:47Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+  }
+
+  private def configureSsl(kubernetesClient: KubernetesClient)
+      : (Array[EnvVar], Array[Volume], Array[VolumeMount], Array[Secret]) = {


nit: move colon to previous line

ash211 · 2017-01-26T06:50:05Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+      (sslEnvs.toArray, Array(sslVolume), Array(sslVolumeMount), secrets.toArray)
+    } else {
+      (Array[EnvVar](), Array[Volume](), Array[VolumeMount](), Array[Secret]())
+    }


maybe you can move this exceptional case to the start?

in general, it can read nicely to change from:

if (normal1) { if (normal2) { if (normal3) { // do everything normal } else { // handle not normal3 } } else { // handle not normal2 } } else { // handle not normal1 }

to this form instead:

if (!normal1) { // handle not normal1 } if (!normal2) { // handle not normal2 } if (!normal3) { // handle not normal3 } // do everything normal

Advantages of the second are that the indentation is much shallower, and also that the non-normal handling and the check are next to each other, instead of split to opposite sides of the method.

To do this here we would have to use the return keyword but I don't know if using return is idiomatic in Scala. To avoid using the return keyword then the last statement must be the values that are returned. But then we would have to use if...else to make it such that the case in the if returns the value properly., and in such a case we don't reduce the indentation as we would prefer.

Regarding if-else ordering - if we want to avoid the return statement here, then I prefer checking the positive of a conditional when branching with if... else.

Indeed, return seems to frowned on in scala: https://tpolecat.github.io/2014/05/09/return.html

No need to change the style if it doesn't flow smoothy

ash211 · 2017-01-26T06:50:39Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+          (trustManagers(0).asInstanceOf[X509TrustManager], sslContext)
+        }).getOrElse((null, SSLContext.getDefault))
+      } else {
+        (null, SSLContext.getDefault)


same thing here with moving exception handling to the top of the method and flattening some indentation out

ash211 · 2017-01-26T06:53:12Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

    val nodeAddress = node.getStatus.getAddresses.asScala.head.getAddress
-    val url = s"http://$nodeAddress:$servicePort"
-    HttpClientUtil.createClient[KubernetesSparkRestApi](uri = url)
+    val urlScheme = if (driverLaunchSslOptions.enabled) "https" else "http"


log a warning if this is http -- users reading logs should be notified that they're running an insecure configuration

ash211 · 2017-01-26T06:56:49Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/KubernetesSparkRestServer.scala

+  private def fileToUtf8String(filePath: String) = {
+    val passwordFile = new File(filePath)
+    if (!passwordFile.isFile) {
+      throw new IllegalArgumentException("KeyStore password file does not exist or " +


this can be a keystore password file or a key password file (you use it for both), so need to make the message more generic.

also include the path looked at in the error message

ash211 · 2017-01-26T06:57:37Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/KubernetesSparkRestServer.scala

  }
+
+  private def fileToUtf8String(filePath: String) = {
+    val passwordFile = new File(filePath)


do we need this File to be Utils.tryWithResourced ?

No - FIles.readAllBytes opens and closes the streams appropriately.

ash211 · 2017-01-26T06:59:21Z

...ests/src/test/scala/org/apache/spark/deploy/kubernetes/integrationtest/KubernetesSuite.scala

+      "--conf", s"spark.ssl.kubernetes.driverlaunch.trustStorePassword=changeit",
+      EXAMPLES_JAR)
+    SparkSubmit.main(args)
+  }


for the truststore and keystore below, create a README file next to it explaining how they were generated. ideally, generate them programmatically as part of the integration tests

I couldn't get them to generate programmatically in Java/Scala code unfortunately without JCE unlimited key strength. If we could guarantee that our build system would have the JCE policy files installed then that's probably ok. Alternatively I could also have been using the security APIs incorrectly (was trying to use BouncyCastle) so someone should feel free to give that a go but I think checking in the static files is fine for now.

ash211 · 2017-01-26T07:03:27Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/KubernetesSparkRestServer.scala

        case "--port" :: value :: tail =>
          args = tail
          resolvedArguments.copy(port = Some(value.toInt))
+        case "--use-ssl" :: value :: tail =>


move this down one so all the ssl-related flags are together

mccheah · 2017-01-26T20:44:51Z

resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/driver/Dockerfile


-# This class will also require setting a secret via the SPARK_APP_SECRET environment variable
-CMD exec bin/spark-class org.apache.spark.deploy.rest.kubernetes.KubernetesSparkRestServer --hostname $HOSTNAME --port $SPARK_DRIVER_LAUNCHER_SERVER_PORT --secret-file $SPARK_SUBMISSION_SECRET_LOCATION
+CMD SSL_ARGS="" && \


This is tricky - I think using SPARK_DAEMON_JAVA_OPTS is another option here with passing system properties through -D. But I like that here it's clear from the Docker File how the driver rest server is to be configured.

One downside here is that creators of custom images will need to stay pretty up to date with the exact version of spark-submit k8s being used for submissions. I'd imagine that we add configuration options not infrequently.

I think it's safe to say though that docker images will be tightly tied to spark-submit versions anyway.

In general I prefer explicit over implicit so seeing this written out seems better

ash211

Seems good -- @foxish thoughts on the SSL implementation here?

ash211 · 2017-01-27T06:03:02Z

resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/driver/Dockerfile


-# This class will also require setting a secret via the SPARK_APP_SECRET environment variable
-CMD exec bin/spark-class org.apache.spark.deploy.rest.kubernetes.KubernetesSparkRestServer --hostname $HOSTNAME --port $SPARK_DRIVER_LAUNCHER_SERVER_PORT --secret-file $SPARK_SUBMISSION_SECRET_LOCATION
+CMD SSL_ARGS="" && \


One downside here is that creators of custom images will need to stay pretty up to date with the exact version of spark-submit k8s being used for submissions. I'd imagine that we add configuration options not infrequently.

I think it's safe to say though that docker images will be tightly tied to spark-submit versions anyway.

In general I prefer explicit over implicit so seeing this written out seems better

ash211 · 2017-01-27T06:04:20Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/KubernetesSparkRestServer.scala

    barrier.await()
  }
+
+  private def fileToUtf8String(filePath: String, fileType: String) = {


You can maybe replace this method with Guava's Files.toString

http://digitalsanctum.com/2012/11/30/how-to-read-file-contents-in-java-the-easy-way-with-guava/

ash211 · 2017-01-27T06:06:22Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/KubernetesSparkRestServer.scala

 import scala.collection.mutable.ArrayBuffer

-import org.apache.spark.{SecurityManager, SPARK_VERSION, SparkConf}
+import org.apache.spark.{SecurityManager, SPARK_VERSION => sparkVersion, SparkConf, SparkException, SSLOptions}


why alias SPARK_SESSION to something different? The casing nicely implies that it's a constant

This keeps things consistent with other places that use this constant.

ash211 · 2017-01-27T06:10:28Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+      "https"
+    } else {
+      logWarning("Submitting application details and local jars to the cluster" +
+        " over an insecure connection. Consider configuring SSL to secure" +


I think we should mention "secret" in the application details, and maybe "Strong consider" turning on SSL

ash211 · 2017-01-27T06:12:51Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+        " this step.")
+      "http"
+    }
+    val (trustManager, sslContext): (X509TrustManager, SSLContext) =


we could pull this whole expression out into a separate function if we wanted to.. just takes in driverLaunchSslOptions

ash211 · 2017-01-27T06:41:49Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/Client.scala

+      (sslEnvs.toArray, Array(sslVolume), Array(sslVolumeMount), secrets.toArray)
+    } else {
+      (Array[EnvVar](), Array[Volume](), Array[VolumeMount](), Array[Secret]())
+    }


Indeed, return seems to frowned on in scala: https://tpolecat.github.io/2014/05/09/return.html

No need to change the style if it doesn't flow smoothy

ash211 · 2017-01-27T07:46:48Z

I have no major qualms -- we can merge this now (note that it's not into the main branch but rather into the in-progress nodeport-upload branch) and do a full review when that branch goes in

mccheah · 2017-01-27T22:32:51Z

Going to merge this immediately since it's getting hard to track the changes both here and on nodeport-upload. Let's review on #30 instead.

…it jars (#30) * Revamp ports and service setup for the driver. - Expose the driver-submission service on NodePort and contact that as opposed to going through the API server proxy - Restrict the ports that are exposed on the service to only the driver submission service when uploading content and then only the Spark UI after the job has started * Move service creation down and more thorough error handling * Fix missed merge conflict * Add braces * Fix bad merge * Address comments and refactor run() more. Method nesting was getting confusing so pulled out the inner class and removed the extra method indirection from createDriverPod() * Remove unused method * Support SSL configuration for the driver application submission (#49) * Support SSL when setting up the driver. The user can provide a keyStore to load onto the driver pod and the driver pod will use that keyStore to set up SSL on its server. * Clean up SSL secrets after finishing submission. We don't need to persist these after the pod has them mounted and is running already. * Fix compilation error * Revert image change * Address comments * Programmatically generate certificates for integration tests. * Address comments * Resolve merge conflicts * Fix bad merge * Remove unnecessary braces * Fix compiler error

…it jars (apache-spark-on-k8s#30) * Revamp ports and service setup for the driver. - Expose the driver-submission service on NodePort and contact that as opposed to going through the API server proxy - Restrict the ports that are exposed on the service to only the driver submission service when uploading content and then only the Spark UI after the job has started * Move service creation down and more thorough error handling * Fix missed merge conflict * Add braces * Fix bad merge * Address comments and refactor run() more. Method nesting was getting confusing so pulled out the inner class and removed the extra method indirection from createDriverPod() * Remove unused method * Support SSL configuration for the driver application submission (apache-spark-on-k8s#49) * Support SSL when setting up the driver. The user can provide a keyStore to load onto the driver pod and the driver pod will use that keyStore to set up SSL on its server. * Clean up SSL secrets after finishing submission. We don't need to persist these after the pod has them mounted and is running already. * Fix compilation error * Revert image change * Address comments * Programmatically generate certificates for integration tests. * Address comments * Resolve merge conflicts * Fix bad merge * Remove unnecessary braces * Fix compiler error

…tions ### What changes were proposed in this pull request? In order to avoid frequently changing the value of `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions`, we usually set `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions` much larger than `spark.sql.shuffle.partitions` after enabling adaptive execution, which causes some bucket map join lose efficacy and add more `ShuffleExchange`. How to reproduce: ```scala val bucketedTableName = "bucketed_table" spark.range(10000).write.bucketBy(500, "id").sortBy("id").mode(org.apache.spark.sql.SaveMode.Overwrite).saveAsTable(bucketedTableName) val bucketedTable = spark.table(bucketedTableName) val df = spark.range(8) spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) // Spark 2.4. spark.sql.adaptive.enabled=false // We set spark.sql.shuffle.partitions <= 500 every time based on our data in this case. spark.conf.set("spark.sql.shuffle.partitions", 500) bucketedTable.join(df, "id").explain() // Since 3.0. We enabled adaptive execution and set spark.sql.adaptive.shuffle.maxNumPostShufflePartitions to a larger values to fit more cases. spark.conf.set("spark.sql.adaptive.enabled", true) spark.conf.set("spark.sql.adaptive.shuffle.maxNumPostShufflePartitions", 1000) bucketedTable.join(df, "id").explain() ``` ``` scala> bucketedTable.join(df, "id").explain() == Physical Plan == *(4) Project [id#5L] +- *(4) SortMergeJoin [id#5L], [id#7L], Inner :- *(1) Sort [id#5L ASC NULLS FIRST], false, 0 : +- *(1) Project [id#5L] : +- *(1) Filter isnotnull(id#5L) : +- *(1) ColumnarToRow : +- FileScan parquet default.bucketed_table[id#5L] Batched: true, DataFilters: [isnotnull(id#5L)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/apache-spark/spark-3.0.0-SNAPSHOT-bin-3.2.0/spark-warehou..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 500 out of 500 +- *(3) Sort [id#7L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(id#7L, 500), true, [id=apache-spark-on-k8s#49] +- *(2) Range (0, 8, step=1, splits=16) ``` vs ``` scala> bucketedTable.join(df, "id").explain() == Physical Plan == AdaptiveSparkPlan(isFinalPlan=false) +- Project [id#5L] +- SortMergeJoin [id#5L], [id#7L], Inner :- Sort [id#5L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#5L, 1000), true, [id=apache-spark-on-k8s#93] : +- Project [id#5L] : +- Filter isnotnull(id#5L) : +- FileScan parquet default.bucketed_table[id#5L] Batched: true, DataFilters: [isnotnull(id#5L)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/apache-spark/spark-3.0.0-SNAPSHOT-bin-3.2.0/spark-warehou..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 500 out of 500 +- Sort [id#7L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(id#7L, 1000), true, [id=apache-spark-on-k8s#92] +- Range (0, 8, step=1, splits=16) ``` This PR makes read bucketed tables always obeys `spark.sql.shuffle.partitions` even enabling adaptive execution and set `spark.sql.adaptive.shuffle.maxNumPostShufflePartitions` to avoid add more `ShuffleExchange`. ### Why are the changes needed? Do not degrade performance after enabling adaptive execution. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit test. Closes apache#26409 from wangyum/SPARK-29655. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

mccheah added 7 commits January 25, 2017 13:23

Support SSL when setting up the driver.

07ee27d

The user can provide a keyStore to load onto the driver pod and the driver pod will use that keyStore to set up SSL on its server.

Merge branch 'nodeport-upload' into ssl-submit

ae3f70d

Merge branch 'nodeport-upload' into ssl-submit

d044cce

Merge branch 'nodeport-upload' into ssl-submit

b955500

Clean up SSL secrets after finishing submission.

21d5bc4

We don't need to persist these after the pod has them mounted and is running already.

Fix compilation error

c91b308

Revert image change

a43367c

ash211 reviewed Jan 26, 2017

View reviewed changes

mccheah mentioned this pull request Jan 26, 2017

Access the Driver Launcher Server over NodePort for app launch + submit jars #30

Merged

Merge branch 'nodeport-upload' into ssl-submit

ec22bcd

mccheah commented Jan 26, 2017

View reviewed changes

mccheah added 2 commits January 26, 2017 12:56

Address comments

866d90b

Programmatically generate certificates for integration tests.

c03fbbb

ash211 reviewed Jan 27, 2017

View reviewed changes

mccheah added 3 commits January 27, 2017 12:31

Address comments

55176ff

Merge branch 'nodeport-upload' into ssl-submit

b0bed5f

Resolve merge conflicts

3438a8a

mccheah merged commit ee8b0a5 into nodeport-upload Jan 27, 2017

mccheah deleted the ssl-submit branch January 27, 2017 22:46

Support SSL configuration for the driver application submission #49

Support SSL configuration for the driver application submission #49

Uh oh!

Conversation

mccheah commented Jan 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah Jan 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah Jan 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash211 commented Jan 27, 2017

Uh oh!

mccheah commented Jan 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mccheah Jan 26, 2017 •

edited

Loading

mccheah Jan 26, 2017 •

edited

Loading