Skip to content

Conversation

@Darkless012
Copy link
Contributor

@Darkless012 Darkless012 commented Feb 21, 2025

Included uncommitted remarks in #754 (review)

Script Overview:

Function definition LINES 70-585:

  • get_host_ldconfig - Locates the host system's ldconfig, avoiding CVMFS paths.
  • get_nvlib_list - Downloads or provides default list of required NVIDIA libraries.
  • check_global_read - Verifies if current umask allows global read access.
  • check_nvidia_smi_info - Checks for nvidia-smi command and extracts GPU information.
  • show_ld_preload - Suggests configurations for LD_PRELOAD environment for CUDA libraries
  • find_cuda_libraries_on_host - Check host's ldconfig, gathers library paths, and filters them on matching.
  • symlink_mode - Actually symlinks the Matched libraries to correct folders.

Script logic: LINES 585-650

  • Check EESSI is installed
  • Check nvidia-smi present
  • Parse CLI args
  • Run nvidia-smi to obtain CUDA info
  • Locate CUDA libraries on HOST (ldconfig)
    • Either shows LD_PRELOAD
    • Or actually symlinks CUDA libraries to correct places, where EESSI expects them.

@eessi-bot
Copy link

eessi-bot bot commented Feb 21, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@riscv-eessi-io-bot
Copy link

Instance eessi-bot-riscv is configured to build for:

  • architectures: riscv64/generic
  • repositories: riscv.eessi.io-20240402

@eessi-bot
Copy link

eessi-bot bot commented Feb 21, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Feb 21, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

…jections directory path from EESSI_HOST_INJECTIONS
…s for symlinks, added fail fast on normal run.
…k-nvidia-libs.sh to include all fake libraries
@Darkless012 Darkless012 requested a review from ocaisa March 14, 2025 07:13
Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor tweaks

@Darkless012 Darkless012 requested a review from ocaisa March 18, 2025 11:12
Add newlines at end of files
Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for effort @Darkless012

@ocaisa
Copy link
Member

ocaisa commented Mar 18, 2025

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2

@eessi-bot
Copy link

eessi-bot bot commented Mar 18, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen2
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen2 resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Mar 18, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen2 from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen2
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen2 resulted in:

    • no jobs were submitted

@eessi-bot-trz42
Copy link

Updates by the bot instance trz42-GH200-jr (click for details)
  • account ocaisa has NO permission to send commands to the bot

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account ocaisa has NO permission to send commands to the bot

@eessi-bot
Copy link

eessi-bot bot commented Mar 18, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.03/pr_922/50947

date job status comment
Mar 18 11:35:56 UTC 2025 submitted job id 50947 awaits release by job manager
Mar 18 11:36:17 UTC 2025 released job awaits launch by Slurm scheduler
Mar 18 11:42:22 UTC 2025 running job 50947 is running
Mar 18 11:49:31 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-50947.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1742298171.tar.gzsize: 0 MiB (7596 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen2/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh
Mar 18 11:49:31 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86_64_amd_zen2+default
P: perf: 335.48 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86_64_amd_zen2+default
P: perf: 445.288 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86_64_amd_zen2+default
P: latency: 1.8 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 2.77 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 3.8 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 4.23 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86_64_amd_zen2+default
P: latency: 0.59 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86_64_amd_zen2+default
P: latency: 2.63 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86_64_amd_zen2+default
P: bandwidth: 7258.26 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86_64_amd_zen2+default
P: bandwidth: 7206.68 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-50947.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Mar 18 11:52:24 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1742298171.tar.gz to S3 bucket succeeded

@ocaisa ocaisa added the bot:deploy Ask bot to deploy missing software installations to EESSI label Mar 18, 2025
@eessi-bot-trz42
Copy link

Label bot:deploy has been set by user ocaisa, but this person does not have permission to trigger deployments

1 similar comment
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user ocaisa, but this person does not have permission to trigger deployments

@ocaisa
Copy link
Member

ocaisa commented Mar 18, 2025

Staging PR has been merged

@ocaisa ocaisa merged commit d4b716a into EESSI:2023.06-software.eessi.io Mar 18, 2025
60 checks passed
@eessi-bot
Copy link

eessi-bot bot commented Mar 18, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.03/pr_922/50947'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.03.18

@eessi-bot
Copy link

eessi-bot bot commented Mar 18, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.03.18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:deploy Ask bot to deploy missing software installations to EESSI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants