-
Notifications
You must be signed in to change notification settings - Fork 117
Generate the application ID label irrespective of app name. #331
Conversation
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add an integration test verifying that the restrictions on Spark app names have loosened. After this do we expect application names of exactly 63 characters and containing non-DNS label characters like _ to succeed?
| .doc("Prefix to use in front of the executor pod names.") | ||
| .internal() | ||
| .stringConf | ||
| .createWithDefault("spark") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this default ever be used? seems the value will be set before its ever accessed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky because it's always set on the submission client, but in order to get the value directly we need to use sparkConf.get(...). If the parameter is a config entry of type optional, then we need to call .get on the option, and require it to be present. I think it's ok if the contract is that it's always provided but there's still a default value here. At worst the executor names are incorrect but since we're doing all of our logic based on the ID label, it should not matter for correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compromise with ...getOrElse("spark") ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be more or less the same effect as this, so we might as well encode the default here - code in the config class is preferred over complexity in the scheduler backend class.
| .withName(kubernetesDriverPodName) | ||
| .addToLabels(allLabels.asJava) | ||
| .addToAnnotations(parsedCustomAnnotations.asJava) | ||
| .addToAnnotations(SPARK_APP_NAME_ANNOTATION, appName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotations don't have the same restrictions labels do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem that way since we put in full init-container specs in the annotations.
…8s/spark into use-generated-id-label
|
@ash211 I added an integration test. |
|
I just tried a test and underscores aren't allowed in pod names either, unfortunately. Kubernetes documentation has the exact requirements - lowercase and numerical characters, dashes, and dots: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/. |
|
if we rebase this on |
ash211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- this looks to significantly reduce exposure where the user provides a Spark app name that's reused for something in kubernetes and is then rejected.
Planning to merge when builds are green
|
rerun integration test please |
| private[spark] val SPARK_ROLE_LABEL = "spark-role" | ||
| private[spark] val SPARK_POD_DRIVER_ROLE = "driver" | ||
| private[spark] val SPARK_POD_EXECUTOR_ROLE = "executor" | ||
| private[spark] val SPARK_APP_NAME_ANNOTATION = "spark-app-name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "reserved" identifiers like "spark-app-name" in annotation keys be documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could be but the error message might be self-documenting enough, and I'm not sure if it's worth the space. Do you anticipate this being an annotation that people will want to set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems more like a low-prob corner case. If it's a lot of doc, then might not be worth it. If it's low effort then I'd consider it worth doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly would be worried if there is potential for silent failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We send the notification here: https://github.com/apache-spark-on-k8s/spark/pull/331/files#diff-8c861c0709460ebb9571f0d44791b6beR92
* Generate the application ID label irrespective of app name. * Add an integration test. * Fix scalastyle
…park-on-k8s#331) * Generate the application ID label irrespective of app name. * Add an integration test. * Fix scalastyle
Closes #330.
Use the application name as prefixes for Kubernetes resource names, but the restrictions on labels enforced by Kubernetes makes it untenable to put the application name in any labels. Therefore use a generated UUID as the "application ID" which we use both for registering with the external shuffle service and also when locating executor pods for the application.