Skip to content

Conversation

@maomaodev
Copy link

@maomaodev maomaodev commented Jan 13, 2025

What changes were proposed in this pull request?

In this pr, for spark on k8s, the krb5.conf config map will be mounted in executor side as well.
Before, the krb5.conf config map is only mounted in driver side. But according to the parameter description, the krb5.conf file should be mounted on both the driver and the executor.

  val KUBERNETES_KERBEROS_KRB5_FILE =
    ConfigBuilder("spark.kubernetes.kerberos.krb5.path")
      .doc("Specify the local location of the krb5.conf file to be mounted on the driver " +
        "and executors for Kerberos. Note: The KDC defined needs to be " +
        "visible from inside the containers ")
      .version("3.0.0")
      .stringConf
      .createOptional

  val KUBERNETES_KERBEROS_KRB5_CONFIG_MAP =
    ConfigBuilder("spark.kubernetes.kerberos.krb5.configMapName")
      .doc("Specify the name of the ConfigMap, containing the krb5.conf file, to be mounted " +
        "on the driver and executors for Kerberos. Note: The KDC defined" +
        "needs to be visible from inside the containers ")
      .version("3.0.0")
      .stringConf
      .createOptional

Why are the changes needed?

After SPARK-43504, the hadoop config map will be mounted on the executor pod.
Now the executor pod fails to start because the hadoop conf file contains Kerberos authentication configuration, but the executor does not mount krb5.conf correctly.
See the #41181 discuss.

Does this PR introduce any user-facing change?

Yes, users do not need to take workarounds to make executors load the krb5.conf.
Such as:

  • including krb5.conf in executor image
  • placing krb5.conf in executor working directory using --files

How was this patch tested?

  • UT.
  • Manually test
    1. After compiling the code successfully, I rebuilt the Spark image (the image does not contain the /etc/krb5.conf file) and used the new client to submit.
    2. Prior to this PR, the executor pod fails to start, logs showed an error: java.lang.IllegalArgumentException: Can't get Kerberos realm. Because krb5.conf was not mounted to the executor pod correctly.
    3. After this PR, the executor pod starts successfully, and the executor pod logs show Kerberos authentication, with the same krb5.conf config map mounted as the driver pod.
企业微信20250113-154711@2x 企业微信20250113-154442@2x

Was this patch authored or co-authored using generative AI tooling?

no

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, the change LGTM.
cc @yaooqinn @wangyum @turboFei

Copy link
Member

@turboFei turboFei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@maomaodev
Copy link
Author

gentle ping @yaooqinn @wangyum

@maomaodev
Copy link
Author

Today, I revisited this PR and after testing, I found that in client mode, both the hadoop conf and krb5.conf cannot be mounted to the executor pod correctly. Is this the expected behavior?
Of course, the executor pod can still start normally, executor pod log shows UserGroupInformation: Hadoop UGI authentication : SIMPLE. @pan3793 @turboFei

@turboFei
Copy link
Member

Today, I revisited this PR and after testing, I found that in client mode, both the hadoop conf and krb5.conf cannot be mounted to the executor pod correctly. Is this the expected behavior? Of course, the executor pod can still start normally, executor pod log shows UserGroupInformation: Hadoop UGI authentication : SIMPLE. @pan3793 @turboFei

Hi @maomaodev in my use case, the spark jobs are submitted via kyuubi gateway with cluster mode. So I did not aware this kind of issue.

@turboFei
Copy link
Member

Is this the expected behavior?

Can it pass the kerberos authentication in client mode?

@maomaodev
Copy link
Author

maomaodev commented Jan 15, 2025

Can it pass the kerberos authentication in client mode?

@turboFei Yes, in client mode, since the Hadoop conf is not mounted correctly, it is also acceptable not to mount krb5.conf, although this not be the expected behavior).
In my opinion, if we want to be compatible with client mode, then the executor pod should not share the same config map with the driver pod. Perhaps we can refer to the implementation here: org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend#setUpExecutorConfigMap.

@maomaodev maomaodev closed this Jan 15, 2025
@maomaodev maomaodev reopened this Jan 15, 2025
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 26, 2025
@github-actions github-actions bot closed this Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants