Skip to content

Commit 7cc0b4a

Browse files
authored
Merge pull request #188 from casparvl/host_injections
Host injections, lmod hooks and moving gpu-related host-injection instructions
2 parents 25d5d8f + 085c890 commit 7cc0b4a

File tree

10 files changed

+268
-52
lines changed

10 files changed

+268
-52
lines changed

docs/adding_software/debugging_failed_builds.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ If you want to install NVIDIA GPU software, make sure to also add the `--nvidia
6161
While the above works perfectly well, you might not be able to complete your debugging session in one go. With the above approach, several steps will just be repeated every time you start a debugging session:
6262

6363
- Downloading the container
64-
- Installing `CUDA` in your [host injections](../gpu.md#host_injections) directory (only if you use the `EESSI-install-software.sh` script, see below)
64+
- Installing `CUDA` in your [host injections](../site_specific_config/host_injections.md) directory (only if you use the `EESSI-install-software.sh` script, see below)
6565
- Installing all dependencies (before you get to the package that actually fails to build)
6666

6767
To avoid this, we create two directories. One holds the container & `host_injections`, which are (typically) common between multiple PRs and thus you don't have to redownload the container / reinstall the `host_injections` if you start working on another PR. The other will hold the PR-specific data: a tarball storing the software you'll build in your interactive debugging session. The paths we pick here are just example, you can pick any persistent, writeable location for this:

docs/adding_software/opening_pr.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ git push koala example_branch
9797
If all goes well, one or more bots :robot: should almost instantly create a comment in your pull request
9898
with an overview of how it is configured - you will need this information when providing build instructions.
9999

100-
### Rebuilding software
100+
### Rebuilding software {: #rebuilding_software }
101101
We typically do not rebuild software, since (strictly speaking) this breaks reproducibility for anyone using the software. However, there are certain situations in which it is difficult or impossible to avoid.
102102

103103
To do a rebuild, you add the software you want to rebuild to a dedicated easystack file in the `rebuilds` directory. Use the following naming convention: `YYYYMMDD-eb-<EB_VERSION>-<APPLICATION_NAME>-<APPLICATION_VERSION>-<SHORT_DESCRIPTION>.yml`, where `YYYYMMDD` is the opening date of your PR. E.g. `2024.05.06-eb-4.9.1-CUDA-12.1.1-ship-full-runtime.yml` was added in a PR on the 6th of May 2024 and used to rebuild CUDA-12.1.1 using EasyBuild 4.9.1 to resolve an issue with some runtime libraries missing from the initial CUDA 12.1.1 installation.

docs/gpu.md renamed to docs/site_specific_config/gpu.md

Lines changed: 6 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
More information on the actions that must be performed to ensure that GPU software included in EESSI
44
can use the GPU in your system is available below.
55

6-
[Please open a support issue](support.md) if you need help or have questions regarding GPU support.
6+
[Please open a support issue](../support.md) if you need help or have questions regarding GPU support.
77

88
!!! tip "Make sure the `${EESSI_VERSION}` version placeholder is defined!"
99
In this page, we use `${EESSI_VERSION}` as a placeholder for the version of the EESSI repository,
@@ -39,33 +39,7 @@ An additional requirement is necessary if you want to be able to compile CUDA-en
3939

4040
Below, we describe how to make sure that the EESSI software stack can find your NVIDIA GPU drivers and (optionally) full installations of the CUDA SDK.
4141

42-
### `host_injections` variant symlink {: #host_injections }
43-
44-
In the EESSI repository, a special directory has been prepared where system administrators can install files that can be picked up by
45-
software installations included in EESSI. This gives the ability to administrators to influence the behaviour (and capabilities) of the EESSI software stack.
46-
47-
This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*:
48-
a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)).
49-
50-
!!! info "Default target for `host_injections` variant symlink"
51-
52-
Unless otherwise configured in the CernVM-FS client configuration for the EESSI repository, the `host_injections` symlink points to `/opt/eessi` on the client system:
53-
```
54-
$ ls -l /cvmfs/software.eessi.io/host_injections
55-
lrwxrwxrwx 1 cvmfs cvmfs 10 Oct 3 13:51 /cvmfs/software.eessi.io/host_injections -> /opt/eessi
56-
```
57-
58-
As an example, let's imagine that we want to use a architecture-specific location on a shared filesystem as the target for the symlink. This has the advantage that one can make changes under `host_injections` that affect all nodes which share that CernVM-FS configuration. Configuring this in your CernVM-FS configuration would mean adding the following line in the client configuration file:
59-
60-
```{ .ini .copy }
61-
EESSI_HOST_INJECTIONS=/shared_fs/path
62-
```
63-
64-
!!! note "Don't forget to reload the CernVM-FS configuration"
65-
After making a change to a CernVM-FS configuration file, you also need to reload the configuration:
66-
```{ .bash .copy }
67-
sudo cvmfs_config reload
68-
```
42+
### Configuring CUDA driver location {: #driver_location }
6943

7044
All CUDA-enabled software in EESSI expects the CUDA drivers to be available in a specific subdirectory of this `host_injections` directory.
7145
In addition, installations of the CUDA SDK included EESSI are stripped down to the files that we are allowed to redistribute;
@@ -80,7 +54,7 @@ If the corresponding full installation of the CUDA SDK is available there, the C
8054

8155
### Using NVIDIA GPUs via a native EESSI installation {: #nvidia_eessi_native }
8256

83-
Here, we describe the steps to enable GPU support when you have a [native EESSI installation](getting_access/native_installation.md) on your system.
57+
Here, we describe the steps to enable GPU support when you have a [native EESSI installation](../getting_access/native_installation.md) on your system.
8458

8559
!!! warning "Required permissions"
8660
To enable GPU support for EESSI on your system, you will typically need to have system administration rights, since you need write permissions on the folder to the target directory of the `host_injections` symlink.
@@ -108,14 +82,14 @@ To install a full CUDA SDK under `host_injections`, use the `install_cuda_host_i
10882
/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh
10983
```
11084

111-
For example, to install CUDA 12.1.1 in the directory that the [`host_injections` variant symlink](#host_injections) points to,
85+
For example, to install CUDA 12.1.1 in the directory that the [`host_injections` variant symlink](host_injections.md) points to,
11286
using `/tmp/$USER/EESSI` as directory to store temporary files:
11387
```
11488
/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh --cuda-version 12.1.1 --temp-dir /tmp/$USER/EESSI --accept-cuda-eula
11589
```
11690
You should choose the CUDA version you wish to install according to what CUDA versions are included in EESSI;
11791
see the output of `module avail CUDA/` after [setting up your environment for using
118-
EESSI](using_eessi/setting_up_environment.md).
92+
EESSI](../using_eessi/setting_up_environment.md).
11993

12094
You can run `/cvmfs/software.eessi.io/scripts/install_cuda_host_injections.sh --help` to check all of the options.
12195

@@ -139,7 +113,7 @@ We focus here on the [Apptainer](https://apptainer.org/)/[Singularity](https://s
139113
and have only tested the [`--nv` option](https://apptainer.org/docs/user/latest/gpu.html#nvidia-gpus-cuda-standard)
140114
to enable access to GPUs from within the container.
141115

142-
If you are using the [EESSI container](getting_access/eessi_container.md) to access the EESSI software,
116+
If you are using the [EESSI container](../getting_access/eessi_container.md) to access the EESSI software,
143117
the procedure for enabling GPU support is slightly different and will be documented here eventually.
144118

145119
#### Exposing NVIDIA GPU drivers
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# How to configure EESSI
2+
3+
## Why configuration is necessary
4+
5+
Just [installing EESSI](../getting_access/native_installation.md) is enough to get started with the EESSI software stack on a CPU-based system. However, additional configuration is necessary in many other cases, such as
6+
- enabling GPU support on GPU-based systems
7+
- site-specific configuration / tuning of the MPI libraries provided by EESSI
8+
- overriding EESSI's MPI library with an ABI compatible host MPI
9+
10+
## The `host_injections` variant symlink
11+
12+
To allow such site-specific configuration, the EESSI repository includes a special directory where system administrations can install files that can be picked up by the software installations included in EESSI. This special directory is located in `/cvmfs/software.eessi.io/host_injections`, and it is a *CernVM-FS Variant Symlink*:
13+
a symbolic link for which the target can be controlled by the CernVM-FS client configuration (for more info, see ['Variant Symlinks' in the official CernVM-FS documentation](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks)).
14+
15+
!!! info "Default target for `host_injections` variant symlink"
16+
17+
Unless otherwise configured in the CernVM-FS client configuration for the EESSI repository, the `host_injections` symlink points to `/opt/eessi` on the client system:
18+
```
19+
$ ls -l /cvmfs/software.eessi.io/host_injections
20+
lrwxrwxrwx 1 cvmfs cvmfs 10 Oct 3 13:51 /cvmfs/software.eessi.io/host_injections -> /opt/eessi
21+
```
22+
23+
The target for this symlink can be controlled by setting the `EESSI_HOST_INJECTIONS` variable in your local CVMFS configuration for EESSI. E.g.
24+
```{bash}
25+
sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/' > /etc/cvmfs/domain.d/eessi.io.local"
26+
27+
```
28+
29+
!!! note "Don't forget to reload the CernVM-FS configuration"
30+
After making a change to a CernVM-FS configuration file, you also need to reload the configuration:
31+
```{ .bash .copy }
32+
sudo cvmfs_config reload
33+
```
34+
35+
On a heterogeneous system, you may want to use different targets for the variant symlink for different node types. For example, you might have two types of GPU nodes (`gpu1` and `gpu2`) for which the GPU drivers are _not_ in the same location, or not of the same version. Since those are both things we configure under `host_injections`, you'll need separate `host_injections` directories for each node type. That can easily be achieved by putting e.g.
36+
37+
```{bash}
38+
sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu1/' > /etc/cvmfs/domain.d/eessi.io.local"
39+
40+
```
41+
42+
in the CVMFS config on the `gpu1` nodes, and
43+
44+
```{bash}
45+
sudo bash -c "echo 'EESSI_HOST_INJECTIONS=/shared_fs/path/to/host/injections/gpu2/' > /etc/cvmfs/domain.d/eessi.io.local"
46+
47+
```
48+
in the CVMFS config on the `gpu2` nodes.
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Configuring site-specific Lmod hooks
2+
You may want to customize what happens when certain modules are loaded, for example, you may want to set additional environment variables. This is possible with [LMOD hooks](https://lmod.readthedocs.io/en/latest/170_hooks.html). A typical example would be when you want to tune the OpenMPI module for your system by setting additional environment variables when an OpenMPI module is loaded.
3+
4+
5+
## Location of the hooks
6+
The EESSI software stack provides its own set of hooks in `$LMOD_PACKAGE_PATH/SitePackage.lua`. This `SitePackage.lua` also searches for site-specific hooks in two additional locations:
7+
8+
- `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua`
9+
- `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/$EESSI_OS_TYPE/$EESSI_SOFTWARE_SUBDIR/.lmod/SitePackage.lua`
10+
11+
The first allows for hooks that need to be executed for that system, irrespective of the CPU architecture. The second allows for hooks specific to a certain architecture.
12+
13+
## Architecture-independent hooks
14+
Hooks are written in Lua and can use any of the standard Lmod functionality as described in the [Lmod documentation](https://lmod.readthedocs.io/en/latest/170_hooks.html). While there are many types of hooks, you most likely want to specify a load or unload hook. Note that the EESSI hooks provide a nice example of what you can do with hooks. Here, as an example, we will define a `load` hook that environment variable `MY_ENV_VAR` to `1` whenever an `OpenMPI` module is loaded.
15+
16+
First, you typically want to load the necessary Lua packages:
17+
```lua
18+
-- $EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua
19+
20+
-- The Strict package checks for the use of undeclared variables:
21+
require("strict")
22+
23+
-- Load the Lmod Hook package
24+
local hook=require("Hook")
25+
```
26+
27+
Next, we define a function that we want to use as a hook. Unfortunately, registering multiple hooks of the same type (e.g. multiple `load` hooks) is only supported in Lmod 8.7.35+. EESSI version 2023.06 uses Lmod 8.7.30. Thus, we define our function without the local keyword, so that we can still add to it later in an architecture-specific hook (if we wanted to):
28+
29+
```lua
30+
-- Define a function for the hook
31+
-- Note that we define this without 'local' keyword
32+
-- That way we can still add to this function in an architecture-specific hook
33+
function set_my_env_var_openmpi(t)
34+
local simpleName = string.match(t.modFullName, "(.-)/")
35+
if simpleName == 'OpenMPI' then
36+
setenv('MY_ENV_VAR', '1')
37+
end
38+
end
39+
```
40+
41+
for the same reason that multiple hooks cannot be registered, we need to combine this function for our site-specific (architecture-independent) with the function that specifies the EESSI `load` hook. Note that all EESSI hooks will be called `eessi_<hook_type>_hook` by convention.
42+
43+
```lua
44+
-- Registering multiple hook functions, e.g. multiple load hooks is only supported in Lmod 8.7.35+
45+
-- EESSI version 2023.06 uses lmod 8.7.30. Thus, we first have to combine all functions into a single one,
46+
-- before registering it as a hook
47+
local function combined_load_hook(t)
48+
-- Call the EESSI load hook (if it exists)
49+
-- Note that if you wanted to overwrite the EESSI hooks (not recommended!), you would omit this
50+
if eessi_load_hook ~= nil then
51+
eessi_load_hook(t)
52+
end
53+
-- Call the site-specific load hook
54+
set_my_env_var_openmpi(t)
55+
end
56+
```
57+
58+
Then, we can finally register this function as an Lmod hook:
59+
60+
```lua
61+
hook.register("load", combined_load_hook)
62+
```
63+
64+
Thus, our complete `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/.lmod/SitePackage.lua` now looks like this (omitting the comments):
65+
66+
```lua
67+
require("strict")
68+
local hook=require("Hook")
69+
70+
function set_my_env_var_openmpi(t)
71+
local simpleName = string.match(t.modFullName, "(.-)/")
72+
if simpleName == 'OpenMPI' then
73+
setenv('MY_ENV_VAR', '1')
74+
end
75+
end
76+
77+
local function combined_load_hook(t)
78+
if eessi_load_hook ~= nil then
79+
eessi_load_hook(t)
80+
end
81+
set_my_env_var_openmpi(t)
82+
end
83+
84+
hook.register("load", combined_load_hook)
85+
```
86+
87+
Note that for future EESSI versions, if they use Lmod 8.7.35+, this would be simplified to:
88+
89+
```lua
90+
require("strict")
91+
local hook=require("Hook")
92+
93+
local function set_my_env_var_openmpi(t)
94+
local simpleName = string.match(t.modFullName, "(.-)/")
95+
if simpleName == 'OpenMPI' then
96+
setenv('MY_ENV_VAR', '1')
97+
end
98+
end
99+
100+
hook.register("load", set_my_env_var_openmpi, "append")
101+
```
102+
103+
## Architecture-dependent hooks
104+
Now, assume that in addition we want to set an environment variable `MY_SECOND_ENV_VAR` to `5`, but only for nodes that have the `zen3` architecture. First, again, you typically want to load the necessary Lua packages:
105+
106+
```lua
107+
-- $EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua
108+
109+
-- The Strict package checks for the use of undeclared variables:
110+
require("strict")
111+
112+
-- Load the Lmod Hook package
113+
local hook=require("Hook")
114+
```
115+
116+
Next, we define the function for the hook itself
117+
118+
```lua
119+
-- Define a function for the hook
120+
-- This time, we can define it as a local function, as there are no hooks more specific than this
121+
local function set_my_second_env_var_openmpi(t)
122+
local simpleName = string.match(t.modFullName, "(.-)/")
123+
if simpleName == 'OpenMPI' then
124+
setenv('MY_SECOND_ENV_VAR', '5')
125+
end
126+
end
127+
```
128+
129+
Then, we combine the functions into one
130+
131+
```lua
132+
local function combined_load_hook(t)
133+
-- Call the EESSI load hook first
134+
if eessi_load_hook ~= nil then
135+
eessi_load_hook(t)
136+
end
137+
-- Then call the architecture-independent load hook
138+
if set_my_env_var_openmpi(t) ~= nil then
139+
set_my_env_var_openmpi(t)
140+
end
141+
-- And finally the architecture-dependent load hook we just defined
142+
set_my_second_env_var_openmpi(t)
143+
end
144+
```
145+
146+
before finally registering it as an Lmod hook
147+
148+
```lua
149+
hook.register("load", combined_load_hook)
150+
```
151+
152+
Thus, our full `$EESSI_CVMFS_REPO/host_injections/$EESSI_VERSION/software/linux/x86_64/amd/zen3/.lmod/SitePackage.lua` now looks like this (omitting the comments):
153+
154+
```lua
155+
require("strict")
156+
local hook=require("Hook")
157+
158+
local function set_my_second_env_var_openmpi(t)
159+
local simpleName = string.match(t.modFullName, "(.-)/")
160+
if simpleName == 'OpenMPI' then
161+
setenv('MY_SECOND_ENV_VAR', '5')
162+
end
163+
end
164+
165+
local function combined_load_hook(t)
166+
if eessi_load_hook ~= nil then
167+
eessi_load_hook(t)
168+
end
169+
if set_my_env_var_openmpi(t) ~= nil then
170+
set_my_env_var_openmpi(t)
171+
end
172+
set_my_second_env_var_openmpi(t)
173+
end
174+
175+
hook.register("load", combined_load_hook)
176+
```
177+
178+
Again, note that for future EESSI versions, if they use Lmod 8.7.35+, this would simplify to
179+
180+
```lua
181+
require("strict")
182+
local hook=require("Hook")
183+
184+
local function set_my_second_env_var_openmpi(t)
185+
local simpleName = string.match(t.modFullName, "(.-)/")
186+
if simpleName == 'OpenMPI' then
187+
setenv('MY_SECOND_ENV_VAR', '5')
188+
end
189+
end
190+
191+
hook.register("load", set_my_second_var_openmpi, "append")
192+
```

0 commit comments

Comments
 (0)