This project provides an easy way to set up a complete Slurm cluster environment on your personal computer using containers. It's perfect for testing, learning, and development purposes.
Slurm is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters.
- Complete Cluster Environment: Sets up a multi-container Slurm cluster with controllers, a database, a client node, and compute nodes.
- JupyterHub Integration: Includes a JupyterHub instance on the client node for an interactive environment.
- Slurm REST API: The Slurm REST API (
slurmrestd
) is enabled for programmatic access to the cluster. - Choice of OS: Supports different base OS for the cluster nodes (e.g., Rocky Linux 8/9, Debian 12).
- Flexible Authentication: Choose between
auth/munge
(default) andauth/slurm
for cluster authentication. - Customizable: Easily configured through a
.env
file. - Federation & Multi Cluster: Supports federated and multi-cluster environment can be enable but simply uncomment the relevant section in
compose.yml
. - Scalable: Compute nodes can be scaled up or down on the fly.
The cluster consists of the following services, defined in the compose.yml
file:
controller
: Runs the Slurm control daemon (slurmctld
). A second controllercontroller2
is also available for high-availability testing.slurmdbd
: Runs the Slurm Database Daemon (slurmdbd
) for accounting.mariadb
: A MariaDB database server for Slurm accounting.client
: A submission node that also hosts JupyterHub andslurmrestd
. This is your main entry point for interacting with the cluster.compute
: N (default 4) compute nodes running theslurmd
daemon.
You need a container engine that supports the Compose specification. The recommended setup is Podman with Docker Compose.
-
Recommended:
- Podman
- Docker Compose (can be used with Podman)
- Setting Podman to use Docker-compose
- Optional: Podman Desktop for a graphical interface.
-
Alternatives:
- Docker Desktop (includes Docker and Docker Compose).
- Podman Compose (less recommended due to container dependency issues).
This is the fastest way to get your Slurm lab running using images from Docker Hub.
-
Clone the project:
git clone https://gitlab.com/CSniper/slurm-lab.git cd slurm-lab
-
Start the cluster:
podman compose up -d
(Use
docker-compose
if you are using Docker). -
Select an image tag (Optional): By default, the cluster uses the
latest
tag (Rocky Linux 9). You can use a different image by specifying theTAG
in the.env
file. See the list of available tags.For example, to use the Debian-based image, add this line to your
.env
file:TAG=latest-deb
If you want to modify the project or build the container images locally, follow these steps.
- Prepare the project (clone with submodules):
If you are cloning the project for the first time:
If you have already cloned the project without submodules:
git clone --recurse-submodules https://gitlab.com/CSniper/slurm-lab.git cd slurm-lab
cd slurm-lab git submodule update --init --recursive
- Create keys required for the build:
mkdir -pv common/secrets podman run --rm -it \ -v ./json-web-key-generator:/json-web-key-generator \ -v ./common/secrets:/opt \ -v ./common/scripts/jwt-key-generation.sh:/jwt-key-generation.sh \ docker.io/library/maven:3.8.7-openjdk-18-slim /jwt-key-generation.sh
- Build and start the cluster:
Use the
compose.dev.yml
file, which is configured to build the images from the local source code.podman compose -f compose.dev.yml up -d --build
Once the cluster is running, you can access the JupyterHub environment at http://localhost:8080/.
You can log in with one of the following usernames (no password needed): jeremie
, aelita
, yumi
, ulrich
, odd
.
(These are characters from the show Code Lyoko).
You can submit jobs from the terminal within JupyterHub or by using podman exec
.
Example using srun
:
podman exec -it slurm-lab-client-1 srun --nodes=1 --ntasks=1 hostname
Example using sbatch
:
Create a batch script my_job.sh
:
#!/bin/bash
#SBATCH --job-name=my_test_job
#SBATCH --output=my_job_%j.out
#SBATCH --error=my_job_%j.err
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
srun hostname
Submit the job from the client container:
podman cp my_job.sh slurm-lab-client-1:/tmp/my_job.sh
podman exec -it slurm-lab-client-1 sbatch /tmp/my_job.sh
You can easily change the number of active compute nodes. For example, to scale up to 6 nodes:
podman compose up -d --scale compute=6 --no-recreate
The Slurm REST API is available through the client container. The service is exposed on the host at localhost:8080/slurm/v0.0.43
(the exact version may differ).
Please refer to the official documentation for authenticating your requests and for API usage:
The official documentation for the version of Slurm installed in the container is available at http://localhost:8080/doc/.
You can customize the cluster by setting variables in the .env
file.
TAG
: The Docker image tag to use (e.g.,latest
,latest-deb
). See available tags on Docker Hub.MYSQL_USER
,MYSQL_PASSWORD
,MYSQL_DATABASE
,MYSQL_RANDOM_ROOT_PASSWORD
: Required credentials for the MariaDB database.AUTHTYPE
: The Slurm authentication plugin. Can beauth/munge
(default) orauth/slurm
. Setting it toauth/slurm
removes the need for themunge
daemon.JUPYTER_SPAWNER
: By default, JupyterLab sessions are spawned inside theclient
container. Set this tomoss
to use the JupyterHub MOdular Slurm Spawner (moss), which runs each JupyterLab session as a Slurm job on a compute node.
- The
module
command is not available in Jupyter Notebooks running on the Debian 12-based image. - The Debian 11 image is not currently built or released, as the Slurm Debian packages cannot be built on ARM for this version.
- Feature testing for Lua scripts (burst buffer, job submission plugins, routing).
- Explore Slurm's Podman integration.
Contributions are welcome! Please feel free to open an issue or submit a merge request on GitLab.
This project is licensed under the BSD 3-Clause License.