Skip to content

Conversation

@Yikun
Copy link
Member

@Yikun Yikun commented Oct 15, 2022

What changes were proposed in this pull request?

This patch:

  • Add spark uid/gid in dockerfile (useradd and groupadd). (used in entrypoint) This way is also used by others DOI and apache DOI (such as zookeeper, solr, flink).
  • Use spark user in entrypoint.sh rather than Dockerfile. (make sure the spark process is executed as non-root users)
  • Remove USER setting in Dockerfile. (make sure base image has permission to extend dockerifle, such as execute apt update)
  • Chown script to spark:spark instead of root:root. (avoid permission issue such like standalone mode)
  • Add gosu deps, a sudo replacement recommanded by docker and docker official image, and also are used by other DOI images.

This change also follow the rules of docker official images, see also consistency and dockerfile best practices about user.

Why are the changes needed?

The below issues are what I have found so far

  1. Irregular login username
    Docker images username is not very standard, docker run with 185 username is a little bit weird.
$ docker run -ti apache/spark bash
185@d88a24357413:/opt/spark/work-dir$
  1. Permission issue of spark sbin
    And also there are some permission issue when running some spark script, such as standalone mode:
$ docker run -ti apache/spark /opt/spark/sbin/start-master.sh

mkdir: cannot create directory ‘/opt/spark/logs’: Permission denied
chown: cannot access '/opt/spark/logs': No such file or directory
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
/opt/spark/sbin/spark-daemon.sh: line 135: /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out: No such file or directory
failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --host 1c345a00e312 --port 7077 --webui-port 8080
tail: cannot open '/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out' for reading: No such file or directory
full log in /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
  1. spark as base image case is not supported well
    Due to static USER set in Dockerfile.
$ cat Dockerfile
FROM apache/spark
RUN apt update

$  docker build -t spark-test:1015 .
// ... 
------
 > [2/2] RUN apt update:
#5 0.405 E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
#5 0.405 E: Unable to lock directory /var/lib/apt/lists/
------
executor failed running [/bin/sh -c apt update]: exit code: 100

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

  • CI passed: all k8s test

  • Regression test:

# Username is set to spark rather than 185
docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
spark@27bbfca0a581:/opt/spark/work-dir$
# start-master.sh no permission issue
$ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash

spark@8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out
# Image as parent case 
$ cat Dockerfile
FROM spark:scala2.12-java11-python3-r-ubuntu
RUN apt update
$ docker build -t spark-test:1015 .
[+] Building 7.8s (6/6) FINISHED
 => [1/2] FROM docker.io/library/spark:scala2.12-java11-python3-r-ubuntu                                                                                                              0.0s
 => [2/2] RUN apt update                                                                                                                                                              7.7s
  • Other test:
# Test on pyspark
$ cd spark-docker/3.3.0/scala2.12-java11-python3-r-ubuntu
$ docker build -t spark:scala2.12-java11-python3-r-ubuntu .
$ docker run -p 4040:4040 -ti spark:scala2.12-java11-python3-r-ubuntu /opt/spark/bin/pyspark
# A simple test for `start-master.sh` (standalone mode)
$ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
spark@8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out

@Yikun
Copy link
Member Author

Yikun commented Oct 15, 2022

cc @HyukjinKwon @zhengruifeng @dongjoon-hyun @holdenk @attilapiros

Also cc @dcoliversun maybe you could start to help SPARK-40570 SPARK-40569 based on this patch.

Also cc @pan3793 @LuciferYang we had offline discuss about the username

@pan3793
Copy link
Member

pan3793 commented Oct 15, 2022

I'm a little worried about this change, previously, we can achieve the dynamic login user ability by the following patch, but this PR makes the login user be static to "spark"

# If there is no passwd entry for the container UID, attempt to create one
if [ -z "$uidentry" ] ; then
if [ -w /etc/passwd ] ; then
echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd
else
echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"
fi
fi

 # If there is no passwd entry for the container UID, attempt to create one 
 if [ -z "$uidentry" ] ; then 
     if [ -w /etc/passwd ] ; then 
-        echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd 
+        echo "${SPARK_USER_NAME:-$myuid}:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd 
     else 
         echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID" 
     fi 
 fi 

@Yikun
Copy link
Member Author

Yikun commented Oct 15, 2022

@pan3793 The DOI recommand to use a certain account/gid/uid as I mentioned in PR description. Don't worry, the ability to dynamically create users still available when the user specifies a USER_ID based on the official image.

FROM spark
# Specify the User that the actual main process will run as
USER ${spark_uid}

This is the best way I can think of at the moment, taking into account both DOI requirement and your scenario. Let's we had a offiline discussion tmr, time to sleep, lol, happy weekend night!

@pan3793
Copy link
Member

pan3793 commented Oct 15, 2022

Thanks @Yikun for the explanation, the idea of keeping the default login user to 'spark' and allowing to extend to use dynamic login user makes sense to me.

And let me explain a little about why we need the dynamic login user ability. In Spark on Yarn mode, when we launch a Spark application via spark-submit --proxy-user jack ..., the Yarn will launch containers(usually Linux processes) using Linux user "jack", and some components/libraries rely on the login user in default, one example is Alluxio
https://github.com/Alluxio/alluxio/blob/da77d688bdbb0cf0c6477bed4d3187897fe2a2e1/core/common/src/main/java/alluxio/conf/PropertyKey.java#L6469-L6476

  public static final PropertyKey SECURITY_LOGIN_USERNAME =
      stringBuilder(Name.SECURITY_LOGIN_USERNAME)
          .setDescription("When alluxio.security.authentication.type is set to SIMPLE or "
              + "CUSTOM, user application uses this property to indicate the user requesting "
              + "Alluxio service. If it is not set explicitly, the OS login user will be used.")
          .setConsistencyCheckLevel(ConsistencyCheckLevel.ENFORCE)
          .setScope(Scope.CLIENT)
          .build();

To reduce the difference between Spark on Yarn and Spark on K8s, we hope Spark on K8s keeps the same ability to allow to dynamically change login user on submitting Spark application.

@Yikun
Copy link
Member Author

Yikun commented Oct 15, 2022

@pan3793

the idea of keeping the default login user to 'spark' and allowing to extend to use dynamic login user makes sense to me.

Thanks for feedback! Good to know this change also meet your requirements.

And let me explain a little about why we need the dynamic login user ability.

I personally think it's a good way, this also benifit users who want to migrate yarn to k8s. But as we discussed, let us do it in a separate PR.

@Yikun Yikun marked this pull request as ready for review October 16, 2022 03:19
esac

switch_spark_if_root() {
if [ $(id -u) -ne 0 ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be simpler as:

if [ $(id -u) -eq 0 ]; then
  echo gosu spark
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good suggestion!

@dcoliversun
Copy link
Contributor

Thanks for ping me @Yikun I'm working on it.

Copy link

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yikun
Copy link
Member Author

Yikun commented Oct 17, 2022

@HyukjinKwon @zhengruifeng @martin-g @dcoliversun @pan3793 Thanks all! If no more comments, I will merge this today.

@LuciferYang
Copy link
Contributor

fine to me, it is consistent with the usage habits for me

@Yikun Yikun closed this in a75ecb1 Oct 17, 2022
@Yikun
Copy link
Member Author

Yikun commented Oct 17, 2022

@HyukjinKwon @zhengruifeng @martin-g @dcoliversun @pan3793 @LuciferYang Merged. Thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants