-
Notifications
You must be signed in to change notification settings - Fork 117
Allow the driver pod's credentials to be shipped from the submission client #186
Conversation
docs/running-on-kubernetes.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than introduce more k8s terminology, why don't we call this spark.kubernetes.authentication.*?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Please rebase onto |
d8abc21 to
8fca03e
Compare
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think of passing the creds through to the driver pod in the POST rather than via volume mount? does it make a big difference one way or the other?
docs/running-on-kubernetes.md
Outdated
| <td><code>spark.kubernetes.authentication.driver.caCertFile</code></td> | ||
| <td>(none)</td> | ||
| <td> | ||
| CA cert file for connecting to Kubernetes over SSL from the driver pod when requesting executors. This file should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes -> the Kubernetes API server
Start with "Path to the CA cert file" - and in other config options below
What URI scheme is expected, and what happens to paths with no scheme?
| sparkConf: SparkConf, | ||
| kubernetesAppId: String) { | ||
|
|
||
| def getDriverPodKubernetesCredentials(): DriverPodKubernetesCredentials = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this just get since it's pretty clear what you're getting from a DriverPodKubernetesCredentialsProvider and it's the only public method
| throw new SparkException(s"File provided for ${conf.key} at ${file.getAbsolutePath}" + | ||
| s" does not exist or is not a file.") | ||
| } | ||
| (secretName, BaseEncoding.base64().encode(Files.toByteArray(file))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does Files.toByteArray close the File object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The File object is not opened or closed, but rather the stream that pulls the data has to be opened and closed, which Guava takes care of.
| .withMountPath(DRIVER_CONTAINER_KUBERNETES_CREDENTIALS_SECRETS_BASE_DIR) | ||
| .build() | ||
| // Cannot use both service account and mounted secrets | ||
| sparkConf.get(KUBERNETES_SERVICE_ACCOUNT_NAME).foreach { _ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this validation up to right after these are created (validate early)
| val credentialsSecretVolumeMount = new VolumeMountBuilder() | ||
| .withName(credentialsSecretVolume.getName) | ||
| .withReadOnly(true) | ||
| .withMountPath(DRIVER_CONTAINER_KUBERNETES_CREDENTIALS_SECRETS_BASE_DIR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally sure I follow the mounting here -- why do we need to create a volume mount to store data to a file? Is the docker image read-only or something? Or we want to ensure the secrets are disposed of in a secure manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker image only provides a static state that the container starts with, and the volume mounts are attached to that container to add dynamic data from Volume plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is a mechanism to inject files into the running container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an already-running container exactly, but more precisely to attach one on startup of the container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this everywhere we've been providing secrets; the other way secrets can be consumed is via environment variables, but in all of our cases we've been loading them from mounted files.
| .withApiVersion("v1") | ||
| .withMasterUrl(KUBERNETES_MASTER_INTERNAL_URL) | ||
| .withNamespace(kubernetesNamespace) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do the just-one-auth validation here too?
require((MOUNTED_TOKEN.isFile || MOUNTED_CA_CERT.isFile || MOUNTED_CLIENT_KEY.isFile || MOUNTED_CLIENT_CERT.isFile) ^
(SERVICE_ACCOUNT_CA_CERT.isFile || SERVICE_ACCOUNT_TOKEN.isFile))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it's fine to leave it out here and rely on the client to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The service account token file is actually always present, so this check wouldn't be correct.
| private val sslSecretsName = s"$SUBMISSION_SSL_SECRETS_PREFIX-$kubernetesAppId" | ||
| private val sslSecretsDirectory = s"$DRIVER_CONTAINER_SECRETS_BASE_DIR/$kubernetesAppId-ssl" | ||
| private val sslSecretsDirectory = DRIVER_CONTAINER_SUBMISSION_SECRETS_BASE_DIR | ||
| s"/$kubernetesAppId-ssl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does Scala magically concatenate these two adjacent strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this is a typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is failing the integration tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but but but how does it compile? does the hanging string just not get assigned to anything and disappear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Believe that's the case yeah
| .pods | ||
| .withLabel("spark-app-name", "spark-file-existence-test") | ||
| .watch(watch)) { _ => | ||
| SparkSubmit.main(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change from SparkSubmit.main(args) to new Client(...) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkSubmit.main is unwieldy because
(1) It's difficult to re-use arguments since they have to be formatted as an array rather than as a Map, and more importantly
(2) They deal with system properties which is global state across all tests in the JVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's beneficial to test using SparkSubmit if we expect SparkSubmit to process its arguments in a way that our Client doesn't expect, but for the most part otherwise we should call the Client directly.
|
@kimoonkim heads up another flake in |
|
Hm, I wanted to avoid making the submission server more complex by adding more data to the POST, and in the future if we switch to a different submission mechanism we'd have to rewrite more logic there. But I suppose creating fewer Kubernetes resources is beneficial as well particularly if these objects are short-lived. |
|
rerun integration test please |
|
rerun unit tests please |
|
rerun integration test please (previous failed: http://spark-k8s-jenkins.pepperdata.org:8080/job/PR-spark-k8s-integration-test-/20/consoleFull#-167881559204b09c7b-0d94-4ce5-8a08-8f343248b3d8 ) |
|
rerun integration test please |
|
Hm, should we also allow pre-mounted files, like we currently do for the keyStore? e.g. |
|
It seems reasonable to bake in a truststore for the apiserver into docker images, but I'm guessing most places would avoid putting creds into images for security reasons. I'd propose we don't support |
Also some quality of life fixes, most notably formatting all of the documentation strings in config.scala to no longer use triple quotes. Triple quoted strings are difficult to format consistently.
| | this can also be passed to spark-submit via the | ||
| | --kubernetes-namespace command line argument. | ||
| """.stripMargin) | ||
| .doc("The namespace that will be used for running the driver and executor pods. When using" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reformatted all of these because it's difficult to get triple-quoting to have a consistent format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches what YARN has.
|
rerun unit tests please |
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the process is (using oauth token but same for the client cert pair):
- user sets oauth token in config key
spark.kubernetes.authentication.driver.oauthToken - submission client reads that out of SparkConf and creates a
KubernetesCredentialsout of it - then redacts the token out of SparkConf
- posts the
KubernetesCredentialsto the submission server as part of job launch - submission server then writes the token to disk in
writeKubernetesCredentialsand saves its path to SparkConf in thespark.kubernetes.authentication.driver .mounted.oauthTokenFilekey - driver then reads that path from its SparkConf and assembles a kubernetes client with it, in
buildFromWithinPod
This seems pretty reasonable, and POST'ing those things directly to the driver pod vs the temporary volumes follows the pattern of other parts being posted.
No more major comments from me -- @foxish have you had a chance to give this a review?
docs/running-on-kubernetes.md
Outdated
| <td>(none)</td> | ||
| <td> | ||
| Path to the client key file for authenticating against the Kubernetes API server from the driver pod when requesting | ||
| executors. This file should be located on the submitting machine's disk, and will be uploaded to the driver pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should -> must ?
| } finally { | ||
| Utils.tryLogNonFatalError { | ||
| kubernetesResourceCleaner.deleteAllRegisteredResourcesFromKubernetes(kubernetesClient) | ||
| // kubernetesResourceCleaner.deleteAllRegisteredResourcesFromKubernetes(kubernetesClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this line accidentally included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yep
| val configBuilder = oauthTokenFile | ||
| .orElse(caCertFile) | ||
| .orElse(clientKeyFile) | ||
| .orElse(clientCertFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like an unusual pattern for "if any of these are present". Does Scala have an any(Seq[Option[]) method like python's any ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see such an API, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the Scala way (from 2011 via http://stackoverflow.com/questions/6390797/scala-any-and-all-functions and 2013 via http://stackoverflow.com/questions/15932137/checking-that-all-items-in-a-collection-match-a-predicate-in-scala) is:
Seq(oauthTokenFile, caCertFile, clientKeyFile, clientCertFile).any(_.isDefined)
Do you find that clearer than the .orElse chain ?
| if (MOUNTED_CA_CERT.isFile) { | ||
| clientConfigBuilder = clientConfigBuilder.withCaCertFile(MOUNTED_CA_CERT.getAbsolutePath) | ||
| caCertFile.foreach { caFile => | ||
| mountedAuthConfigBuilder = mountedAuthConfigBuilder.withCaCertFile(caFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has to be reassigned every time? would be nice if the fluent API allowed repeated calls on the same reference in these kinds of situations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the Kubernetes client code indicates that we don't have to assign here - but I actually think it's clearer to follow this pattern currently, as it semantically means "This builder is equal to the old builder with these updates". The phrasing of the API (builder.withCaCertFile) made it sound like it doesn't mutate the underlying object - I would have expected setCaCertFile if it did.
docs/running-on-kubernetes.md
Outdated
| </tr> | ||
| <tr> | ||
| <td><code>spark.kubernetes.submit.caCertFile</code></td> | ||
| <td><code>spark.kubernetes.authentication.submission.caCertFile</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this key is kinda long -- thoughts on shortening authentication to just auth ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They use spark.authenticate in other places - but not sure if we should be following that precedent: https://spark.apache.org/docs/latest/configuration.html
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@foxish any last comments? I'm planning to merge this EOD if you don't have anything.
|
I didn't get time to look into it in detail. At a high level, it seems fine. lgtm |
|
Sounds good -- going ahead and merging |
|
Oh uh oh, I merged instead of squashed and merged... which gave us a dirty history going to:
|
|
Done |
Provide a separate set of configurations for loading Kubernetes API server credentials for the driver pod to use when requesting executors.
Credentials specified in the submitter are POST'd to the submission server and used by the driver for interactions with the apiserver. This is a different set of credentials from those used by the submitter when starting the driver pod, allowing for fine-grained access controls between these two k8s clients if desired.
Closes #182