Skip to content

Conversation

@ocaisa
Copy link
Member

@ocaisa ocaisa commented Sep 9, 2025

Fixes #226

Requires that these directories become variable symlinks.

Since this is pretty relevant to security, I am inclined to point these variable symlinks to /dev/null by default but that does not actually address the problem being discussed in #226 (having to harass the admins to link the CUDA drivers). If we can have logic in our CVMFS configuration then maybe we can address that.

Fixes #226 

Since this is pretty relevant to security, I am inclined to point these variable symlinks to `/dev/null` by default but that does not actually address the problem being discussed in #226 (having to harass the admins to link the CUDA drivers). If we can have logic in our CVMFS configuration then maybe we can address that.
@ocaisa
Copy link
Member Author

ocaisa commented Sep 9, 2025

@bedroge Is it possible to have a setting in default.conf that gets picked up by the main configuration and uses that to decide where to point the variable symlinks (if you don't have the variable you get /dev/null, but if you do you get default locations under host_injections).

@bedroge
Copy link
Collaborator

bedroge commented Sep 9, 2025

@bedroge Is it possible to have a setting in default.conf that gets picked up by the main configuration and uses that to decide where to point the variable symlinks (if you don't have the variable you get /dev/null, but if you do you get default locations under host_injections).

Not sure if I fully understand what you're trying to achieve, but wouldn't a variant symlink do exactly that?

ln -s '$(EESSI_202306_OVERRIDE):-/dev/null)' /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib/override

This would by default make it point to /dev/null, and it can only be overridden in the CVMFS configuration of the client (it's not an env var, but a CVMFS configuration variable, so it has to be in one of the files under /etc/cvmfs).

@ocaisa
Copy link
Member Author

ocaisa commented Sep 9, 2025

I'm imagining the following scenario:

  • EESSI is installed, variable symlinks are set to /dev/null
    • This means there is no security hole under host_injections (but remember that these are locations of last resort, they are searched last so they cannot be used to override OS libraries)
  • Site runs the driver install script, the install script checks the CVMFS configuration to advise that the site has two choices:
    • Directly modify the value of the variable symlinks (this will need to be done individually for all EESSI versions as new symlinks will be added with new EESSI versions)
    • Set a variable in their CVMFS configuration that changes the default target from /dev/null to a default location under host_injections. The same default location will be used for future EESSI versions, so sites do not need to do anything extra for new EESSI versions.

@bedroge
Copy link
Collaborator

bedroge commented Sep 9, 2025

I think you basically want to do something similar as is done for CVMFS_USE_CDN? You can definitely do some checking in the EESSI CVMFS config files. For instance, you could instruct sites to put something into their default.local or /etc/cvmfs/config.d/software.eessi.io.local or /etc/cvmfs/domain.d/eessi.io.local, and check for certain variables in the global configuration. Might be best that we create a software.eessi.io.conf in https://github.com/cvmfs-contrib/config-repo/tree/master/etc/cvmfs/config.d, and then add the checks there.

@bedroge
Copy link
Collaborator

bedroge commented Sep 9, 2025

Though I don't know if you can do the nested kind of thing that you seem to be suggesting. You basically want to have a variable that allows them to set those variant symlinks differently for each version, but also one that will change the default for all of them in case the version-specific ones are not defined?

edit: well, maybe you could have the version-specific variant symlinks point to a default dir somewhere, which itself is another variant symlink that points to /dev/null by default? Then you can override the version-specific symlink but also the (shared) default one.

So then /cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib/override would link to $(EESSI_202306_LIB_OVERRIDE:-/cvmfs/software.eessi.io/host_injections/default/override), and /cvmfs/software.eessi.io/host_injections/default/override would link to $(EESSI_LIB_OVERRIDE_DEFAULT:-/dev/null).

@ocaisa
Copy link
Member Author

ocaisa commented Sep 9, 2025

What you describe (variant symlink pointing to a variant symlink) looks like it should work and can satisfy the scenarios I can think of.

EDIT: Apart from $(EESSI_202306_LIB_OVERRIDE:-/cvmfs/software.eessi.io/host_injections/default/override, I think you probably need all variant symlinks to be within the repo...but we can figure out that part later

@bedroge
Copy link
Collaborator

bedroge commented Sep 16, 2025

I was trying to set up an automated procedure for updating the existing compat layer, but I'm running into several issues there. Now I'm wondering if it isn't easier to just do a full rebuild, let's just see if that works and what it produces...

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Sep 16, 2025

error: .github/workflows/scorecards.yml: patch does not apply

Unable to download or merge changes between the source branch and the destination branch.
Tip: This can usually be resolved by syncing your branch and resolving any merge conflicts.

@bedroge
Copy link
Collaborator

bedroge commented Sep 16, 2025

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Sep 16, 2025

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2025.09/pr_227/90330

date job status comment
Sep 16 12:27:34 UTC 2025 submitted job id 90330 awaits release by job manager
Sep 16 12:27:40 UTC 2025 released job awaits launch by Slurm scheduler
Sep 16 12:28:42 UTC 2025 running job 90330 is running
Sep 16 16:11:09 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-90330.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2025.06-compat-linux-x86_64-1758038865.tar.gzsize: 1839 MiB (1928963561 bytes)
entries: 196046
Sep 16 16:11:09 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 24/24 test case(s) from 24 check(s) (2 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-90330.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revise symlink strategy for drivers

3 participants