This repository provides a wrapper which improves the handling of Docker containers run as systemd services.
If a Docker container is started as a systemd service using the "usual" docker run ... instruction, f.ex.
ExecStart=docker run ..., systemd interacts with the Docker client process instead of the container
process, which can lead to situations where systemd's capacity to monitor process health is affected:
- the client can detach or crash while the container is doing fine, yet
systemdwould trigger failure handling - worse, the container crashes and should be taken care of, but the client stalled -
systemdis blind and won't do
anything - when a container is stopped with
docker stop ..., attached client processes exit with error code 143, not 0/success, which triggerssystemd's failure handling unless it's explicitely configured to ignore this usingSuccessExitStatus=143, but that's a workaround. The problem is well explained in this issue description
The key thing that this wrapper does is that it moves the container process from the cgroups set up by Docker
to the service unit's cgroup to give systemd the supervision of the actual Docker container process.
It's written in Golang and allows to leverage all the cgroup functionality of systemd and systemd-notify.
- the code was written by @ibuildthecloud and his co-contributors in this repository. The motivation is explained in this Docker issue #6791 and this mailing list thread.
- @agend07 and co-contributors fixed outdated dependancies and did a first clean-up
- I removed all outdated and broken elements and created a new compilation docker container which can be found here
Supposing that a Go environment is available, the build instruction is go get github.com/DonTseTse/systemd-docker. The
executable can then be found in the Go binary directory (usually something like $GO_ROOT/bin) and it's called
systemd-docker.
It can also be build using a stand-alone docker image, see here
Both
systemctlto managesystemdservices, and- the
dockerCLI
can be used and everything should stay in sync.
In the systemd unit files, the instruction to launch the Docker container takes the form
ExecStart=/path/to/systemd-docker [<systemd-docker_options>] run <docker-run_parameters>
where
/path/to/systemd-dockeris the absolute path of thesystemd-dockerexecutable<systemd-docker_options>are the flags to configure systemd-docker<docker-run_parameters>are forwarded todocker run. A few restrictions apply, see section Docker run restrictions
The example below shows a typical systemd unit file using systemd-docker (supposed to be in /usr/bin), running a
Nginx container:
[Unit]
Description=Nginx
After=docker.service
Requires=docker.service
[Service]
#--- if systemd-notify is used
Type=notify
NotifyAccess=all
#------------------------
ExecStart=/usr/bin/systemd-docker run --rm --name %n nginx
Restart=always
RestartSec=10s
TimeoutStartSec=120
TimeoutStopSec=15
[Install]
WantedBy=multi-user.targetThe use of %n is a systemd feature explained here. Supposing that the unit file example
given above is stored under the likely path /etc/systemd/system/nginx.service, the container is named nginx.
For the details about Type=notify and NotifyAccess=all and systemd-notify, see
systemd notifications.
For a general documentation of all systemd unit file configurations
options, see this documentation.
Container names are compulsory to make sure that each systemd service always relates to/acts upon the same container(s).
While it may seem as if that could be omitted as long as the --rm flag is used to make Docker remove any stopped
container, that's misleading: the deletion process triggered by this flag is actually part of the Docker client logic and
if the client detaches for whatever reason from the running container, the information is lost (even if another client is
re-attached later) and the container will not be deleted upon termination. systemd-docker adds an additional check
and looks for the named container when systemd-docker ... run ... is called - if a stopped container exists, it's removed.
While it processes unit files, systemd populates a range of variables among which %n stands for the name of service,
derived from it's filename. This allows to write a self-configuring ExecStart instruction using the parameters
ExecStart=/path/to/systemd-docker ... run ... --name %n --rm ...
systemd handles environment variables with the instructions Environment=... and EnvironmentFile=.... To inject
variables into other instructions, the pattern is ${variable_name}. With the docker run flag -e they can be passed
from systemd to the Docker container
Example: ExecStart=/path/to/systemd-docker ... run -e ABC=${ABC} -e XYZ=${XYZ} ...
systemd-docker has an option to pass on all defined environment variables using the --env flag, explained
here
systemd-notify can be used to schedule and sequence the launch of different services. The systemd
documentation explains the configuration optionss
available in unit files:
Type=notify: "... it is expected that the daemon sends a notification message via sd_notify(3) or an equivalent call when it has finished starting up. systemd will proceed with starting follow-up units after this notification message has been sent."NotifyAccess=all: "Controls access to the service status notification socket, as accessible via the sd_notify(3) call. ... If all, all services updates from all members of the service's control group are accepted."
By default systemd-docker will send READY=1 to the systemd notification socket but it can also be configured to delegate
this to the container as explained here.
Please be aware that systemd-notify comes with its own quirks - more info can be found in this
mailing list thread. In short, systemd-notify is not reliable because often
the child dies before systemd has time to determine which cgroup it is a member of.
By default all application cgroups are moved to systemd. It's also possible to control individually which cgroups are
transfered using a --cgroups flags for each cgroup to transfer. -cgroups name=systemd is the strict minimum to have
systemd supervise the container.
This implies that the docker run flags --cpuset and/or -m are incompatible.
Example: ExecStart=/path/to/systemd-docker ... --cgroups name=systemd --cgroups=cpu ... run ...
The above command will use the name=systemd and cpu cgroups of systemd but then use Docker's cgroups for all the
others, like the freezer cgroup.
By default the container's stdout/stderr is written to the system journal. This may be disabled with --logs=false.
Example: ExecStart=/path/to/systemd-docker ... --logs=false ... run ...
The systemd environment variables are automatically passed through to the Docker container if the --env flag is set.
It will essentially read all the current environment variables and add the appropriate -e ... flags to the
docker run command.
EnvironmentFile=/etc/environment
ExecStart=systemd-docker ... --env ... run ...
In the example above, all environment variables defined in /etc/environment will be passed to the docker run command.
To create a PID file for the container, use the flag --pid-file=</path/to/pid_file>.
Example: ExecStart=/path/to/systemd-docker ... --pid-file=/var/run/%n.pid ... run ...
The systemd-docker flag --notify makes systemd-docker delegate the systemd-notify READY=1 call to the container
itself. To allow the container to achieve this, systemd-docker bind mounts the systemd notification socket into the
container and sets the NOTIFY_SOCKET environment variable.
Example: ExecStart=/path/to/systemd-docker ... --notify ... run ...
To disable systemd-docker's "remove stopped container" procedure, the flag ... --rm=false ... can be used.
Example: ExecStart=/path/to/systemd-docker ... --rm=false ... run ...
These flags can't be used because they are incompatible with the cgroup migration(s) inherent to systemd-docker.
The -d flag provided to docker run has no effect under systemd-docker. To cause the Docker client to detach after the container is running, use
the systemd-docker options --logs=false --rm=false. If either --logs or --rm is true, the Docker client instance used by systemd-docker is kept
alive until the systemd service is stopped or the container exits.
CentOS 7 is inconsistent in the way it handles some cgroups. It has 3:cpuacct,cpu:/user.slice in /proc/[pid]/cgroups but the corresponding path
/sys/fs/cgroup/cpu,cpuacct/ doesn't exist. This causes systemd-docker to fail when it tries to move the PIDs there. To solve this the name=systemd
cgroup must be explicitely mentioned:
/path/to/systemd-docker ... --cgroups name=systemd ... run ...
See ibuildthecloud#15 for details.
See repository history and credits for acknowledgments. The work on this repository was done in 2018 by DonTseTse.
Licensed under the Apache License, Version 2.0