Slurm Lab

This project provides an easy way to set up a complete Slurm cluster environment on your personal computer using containers. It's perfect for testing, learning, and development purposes.

Slurm is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters.

Features

Complete Cluster Environment: Sets up a multi-container Slurm cluster with controllers, a database, a client node, and compute nodes.
JupyterHub Integration: Includes a JupyterHub instance on the client node for an interactive environment.
Slurm REST API: The Slurm REST API (slurmrestd) is enabled for programmatic access to the cluster.
Choice of OS: Supports different base OS for the cluster nodes (e.g., Rocky Linux 8/9, Debian 12).
Flexible Authentication: Choose between auth/munge (default) and auth/slurm for cluster authentication.
Customizable: Easily configured through a .env file.
Federation & Multi Cluster: Supports federated and multi-cluster environment can be enable but simply uncomment the relevant section in compose.yml.
Scalable: Compute nodes can be scaled up or down on the fly.

Cluster Components

The cluster consists of the following services, defined in the compose.yml file:

controller: Runs the Slurm control daemon (slurmctld). A second controller controller2 is also available for high-availability testing.
slurmdbd: Runs the Slurm Database Daemon (slurmdbd) for accounting.
mariadb: A MariaDB database server for Slurm accounting.
client: A submission node that also hosts JupyterHub and slurmrestd. This is your main entry point for interacting with the cluster.
compute: N (default 4) compute nodes running the slurmd daemon.

Getting Started

Prerequisites

You need a container engine that supports the Compose specification. The recommended setup is Podman with Docker Compose.

Recommended:
- Podman
- Docker Compose (can be used with Podman)
- Setting Podman to use Docker-compose
- Optional: Podman Desktop for a graphical interface.
Alternatives:
- Docker Desktop (includes Docker and Docker Compose).
- Podman Compose (less recommended due to container dependency issues).

Quick Start (Using Pre-built Images)

This is the fastest way to get your Slurm lab running using images from Docker Hub.

Clone the project:

git clone https://gitlab.com/CSniper/slurm-lab.git
cd slurm-lab

Start the cluster:
```
podman compose up -d
```
(Use docker-compose if you are using Docker).
Select an image tag (Optional): By default, the cluster uses the latest tag (Rocky Linux 9). You can use a different image by specifying the TAG in the .env file. See the list of available tags.

For example, to use the Debian-based image, add this line to your .env file:
```
TAG=latest-deb
```

Local Development (Building from Source)

If you want to modify the project or build the container images locally, follow these steps.

Prepare the project (clone with submodules): If you are cloning the project for the first time:
```
git clone --recurse-submodules https://gitlab.com/CSniper/slurm-lab.git
cd slurm-lab
```
If you have already cloned the project without submodules:
```
cd slurm-lab
git submodule update --init --recursive
```

Create keys required for the build:

mkdir -pv common/secrets
podman run --rm -it \
    -v ./json-web-key-generator:/json-web-key-generator \
    -v ./common/secrets:/opt \
    -v ./common/scripts/jwt-key-generation.sh:/jwt-key-generation.sh \
    docker.io/library/maven:3.8.7-openjdk-18-slim /jwt-key-generation.sh

Build and start the cluster: Use the compose.dev.yml file, which is configured to build the images from the local source code.
```
podman compose -f compose.dev.yml up -d --build
```

Usage

Accessing JupyterHub

Once the cluster is running, you can access the JupyterHub environment at http://localhost:8080/.

You can log in with one of the following usernames (no password needed): jeremie, aelita, yumi, ulrich, odd. (These are characters from the show Code Lyoko).

Submitting a Slurm Job

You can submit jobs from the terminal within JupyterHub or by using podman exec.

Example using srun:

podman exec -it slurm-lab-client-1 srun --nodes=1 --ntasks=1 hostname

Example using sbatch: Create a batch script my_job.sh:

#!/bin/bash
#SBATCH --job-name=my_test_job
#SBATCH --output=my_job_%j.out
#SBATCH --error=my_job_%j.err
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1

srun hostname

Submit the job from the client container:

podman cp my_job.sh slurm-lab-client-1:/tmp/my_job.sh
podman exec -it slurm-lab-client-1 sbatch /tmp/my_job.sh

Scaling Compute Nodes

You can easily change the number of active compute nodes. For example, to scale up to 6 nodes:

podman compose up -d --scale compute=6 --no-recreate

Accessing the Slurm REST API

The Slurm REST API is available through the client container. The service is exposed on the host at localhost:8080/slurm/v0.0.43 (the exact version may differ).

Please refer to the official documentation for authenticating your requests and for API usage:

Slurm Documentation

The official documentation for the version of Slurm installed in the container is available at http://localhost:8080/doc/.

Configuration

You can customize the cluster by setting variables in the .env file.

TAG: The Docker image tag to use (e.g., latest, latest-deb). See available tags on Docker Hub.
MYSQL_USER, MYSQL_PASSWORD, MYSQL_DATABASE, MYSQL_RANDOM_ROOT_PASSWORD: Required credentials for the MariaDB database.
AUTHTYPE: The Slurm authentication plugin. Can be auth/munge (default) or auth/slurm. Setting it to auth/slurm removes the need for the munge daemon.
JUPYTER_SPAWNER: By default, JupyterLab sessions are spawned inside the client container. Set this to moss to use the JupyterHub MOdular Slurm Spawner (moss), which runs each JupyterLab session as a Slurm job on a compute node.

Known Issues

The module command is not available in Jupyter Notebooks running on the Debian 12-based image.
The Debian 11 image is not currently built or released, as the Slurm Debian packages cannot be built on ARM for this version.

Roadmap

Feature testing for Lua scripts (burst buffer, job submission plugins, routing).
Explore Slurm's Podman integration.

Contributing

Contributions are welcome! Please feel free to open an issue or submit a merge request on GitLab.

License

This project is licensed under the BSD 3-Clause License.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
build-deb12		build-deb12
build-el8		build-el8
build-el9		build-el9
common		common
git-hooks		git-hooks
gitlab-ci.d		gitlab-ci.d
json-web-key-generator @ 42c85ba		json-web-key-generator @ 42c85ba
jupyterhub_moss @ d34d901		jupyterhub_moss @ d34d901
ompi @ ea8f4d0		ompi @ ea8f4d0
slurm @ 1c0b066		slurm @ 1c0b066
tutorials		tutorials
.env		.env
.env-lyoko		.env-lyoko
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compose.ci.yml		compose.ci.yml
compose.dev.yml		compose.dev.yml
compose.yml		compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Slurm Lab

Features

Cluster Components

Getting Started

Prerequisites

Quick Start (Using Pre-built Images)

Local Development (Building from Source)

Usage

Accessing JupyterHub

Submitting a Slurm Job

Scaling Compute Nodes

Accessing the Slurm REST API

Slurm Documentation

Configuration

Known Issues

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

csniper-patrick/slurm-lab

Folders and files

Latest commit

History

Repository files navigation

Slurm Lab

Features

Cluster Components

Getting Started

Prerequisites

Quick Start (Using Pre-built Images)

Local Development (Building from Source)

Usage

Accessing JupyterHub

Submitting a Slurm Job

Scaling Compute Nodes

Accessing the Slurm REST API

Slurm Documentation

Configuration

Known Issues

Roadmap

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages