-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-43171][K8S] Support custom Unix username in Pod #40831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
cc @Yikun |
Yikun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pan3793 Thanks! We had some discussion discussed this in apache/spark-docker#11 .
A quick question: are you plan to also support it in apache/spark-docker (I guess dynamic switch user might a little bit difficult based on Docker Official Image rule), or to say it's a k8s only feature?
|
@Yikun I suppose it's a K8s-only feature. As mentioned in the PR description, the main purpose is to reduce the gap between Spark on YARN and K8s, to allow users seamlessly migrate Spark jobs from YARN to K8s. I don't have much knowledge about docker/container technology, and I agree w/ you it looks not easy to dynamically switch user based on the Docker Official Image rule |
|
Also cc @yaooqinn |
Yikun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pan3793 Thanks. LGTM, also cc @dongjoon-hyun @holdenk
|
Just for others reviewer infomation, I also wanna share some considerations about this (also include some idea in offline discussion with @pan3793 ):
So, I am +0.5 on this PR. : ) |
|
@pan3793 are we good to merge? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think you can add an integration test case in order to prevent a future regression, @pan3793 ?
| val decomTestTag = Tag("decom") | ||
| val rTestTag = Tag("r") | ||
| val MinikubeTag = Tag("minikube") | ||
| val usernameTestTag = Tag("username") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yikun A new tag is added, you can use it to exclude this test for apache/spark-docker GA
Sure, IT is added. |
|
@Yikun @dongjoon-hyun would you please take a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I'm unsure about this now, the mainly concern is that it depends on the /etc/passwd, according latest DOI review, it might bring some wider permisson issue (docker-library/official-images#13089 (comment) see question 4/5). Use the usermod -l in entrypoint might the solution I can think out.
I also try to find help from the DOI to seek better solution, so maybe wait the reply maybe for some day?
|
So I would say this looks ok to me, but I hear the concerns around modifying /etc/passwd so I agree on waiting to to see what comes out of the DOI discussions. |
|
Here is the solution to resolve /etc/passwd permission issue according to DOI suggestion: apache/spark-docker#45. (use libnss to fake user) I also had a offline discussion with @pan3793, it's also work for this case (if specified the SPARK_USER, then use the libnss to switch fake user). |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR allows the users to custom Unix username in Pod by setting env var
SPARK_USER_NAME, which reduces the gap between Spark on YARN and K8s.Each line in
/etc/passwdis compose ofThis PR simply changes the first item from
$myuidto${SPARK_USER_NAME:-$myuid}to achieve the above ability.Why are the changes needed?
In Spark on YARN, when we launch a Spark application via
spark-submit --proxy-user jack ..., the YARN will launch containers(usually Linux processes) using Unix user "jack", and some components/libraries rely on the login user in default, one example is Alluxiohttps://github.com/Alluxio/alluxio/blob/da77d688bdbb0cf0c6477bed4d3187897fe2a2e1/core/common/src/main/java/alluxio/conf/PropertyKey.java#L6469-L6476
To reduce the difference between Spark on YARN and Spark on K8s, we hope Spark on K8s keeps the same ability to allow to dynamically change login user on submitting Spark application.
Does this PR introduce any user-facing change?
Yes, it allows the user to custom Pod Unix username by setting env var
SPARK_USER_NAMEin K8s, reducing the gap between Spark on YARN and K8s.How was this patch tested?
New IT is added.
Also manually testing in our internal K8s cluster.
Then login the Pod, verify the Unix username by
id -unistominstead of185