Skip to content

Conversation

@eessi-build-deploy-bot-deucalion

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

@eessi-bot
Copy link

eessi-bot bot commented Sep 18, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat

@eessi-bot
Copy link

eessi-bot bot commented Sep 18, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-compat, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software

@laraPPr laraPPr added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Sep 18, 2024
@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 18, 2024

Waiting on #710

@boegel
Copy link
Contributor

boegel commented Sep 18, 2024

@laraPPr NCCL will also be built as a dependency, we should make sure that the LICENSE.txt file is included in the installation there

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 18, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

@eessi-bot
Copy link

eessi-bot bot commented Sep 18, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • parsing the bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80, received from sender laraPPr, failed

@eessi-bot
Copy link

eessi-bot bot commented Sep 18, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • parsing the bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80, received from sender laraPPr, failed

@eessi-build-deploy-bot-deucalion
Updates by the bot instance boegel-bot-deucalion (click for details)
  • account laraPPr has NO permission to send commands to the bot

@boegel
Copy link
Contributor

boegel commented Sep 18, 2024

@laraPPr The production bots need to be updated to be aware of the new accel filter, so it won't work yet until that's taken care of

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 19, 2024

@boegel we are already shipping NCCL

@laraPPr
Copy link
Collaborator Author

laraPPr commented Sep 19, 2024

Opened a pull request in easybuild-easyblock to add the license of NCCL in the installation

@boegel
Copy link
Contributor

boegel commented Sep 25, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

@eessi-build-deploy-bot-deucalion
Copy link

eessi-build-deploy-bot-deucalion bot commented Sep 25, 2024

Updates by the bot instance boegel-bot-deucalion (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_711/19875

date job status comment
Sep 25 21:32:20 UTC 2024 submitted job id 19875 awaits release by job manager
Sep 25 21:32:33 UTC 2024 released job awaits launch by Slurm scheduler
Sep 25 21:39:04 UTC 2024 running job 19875 is running
Sep 25 22:37:54 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-19875.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1727302843.tar.gzsize: 259 MiB (271874626 bytes)
entries: 4461
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Sep 25 22:37:54 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-19875.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 26 13:32:29 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1727302843.tar.gz to S3 bucket succeeded

@boegel
Copy link
Contributor

boegel commented Sep 26, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

@eessi-build-deploy-bot-deucalion
Copy link

eessi-build-deploy-bot-deucalion bot commented Sep 26, 2024

Updates by the bot instance boegel-bot-deucalion (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_711/20003

date job status comment
Sep 26 08:18:14 UTC 2024 submitted job id 20003 awaits release by job manager
Sep 26 08:18:55 UTC 2024 released job awaits launch by Slurm scheduler
Sep 26 08:20:31 UTC 2024 running job 20003 is running
Sep 26 09:08:44 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-20003.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1727340843.tar.gzsize: 259 MiB (271862815 bytes)
entries: 4461
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Sep 26 09:08:44 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-20003.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 26 13:32:51 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen3-1727340843.tar.gz to S3 bucket succeeded

@boegel boegel added the bot:deploy Ask bot to deploy missing software installations to EESSI label Sep 26, 2024
Copy link
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Contributor

boegel commented Sep 26, 2024

I tested this build on our AMD Milan A100 GPU cluster, results (interactive Slurm job with 1x A100, 12 CPU cores):

With CPU-only LAMMPS (LAMMPS/2Aug2023_update2-foss-2023a-kokkos module), using rhodo test case from EESSI test suite:

  • single-core (lmp -in in.rhodo): Performance: 0.841 ns/day
  • 12-core (unset OMP_PROC_BIND; mpirun -np 12 lmp -in in.rhodo): Performance: 8.968 ns/day
  • 12-core w/ Kokkos (lmp -in in.rhodo -kokkos on t 12 -suffix kk -package kokkos newton on neigh half): Performance: 6.153 ns/day

GPU (LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1):

export EESSI_OVERRIDE_GPU_CHECK=1
export LD_PRELOAD=/usr/lib64/libcuda.so
lmp -in in.rhodo -kokkos on t 12 g 1 -suffix kk -package kokkos newton on neigh half

=> Performance: 34.778 ns/day

🥳 :shipit:

@boegel
Copy link
Contributor

boegel commented Sep 26, 2024

ingest PRs merged, deploy under way, so merging this...

@boegel boegel merged commit 00c7e6b into EESSI:2023.06-software.eessi.io Sep 26, 2024
35 checks passed
@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

PR merged! Moved ['/project/def-users/SHARED/jobs/2024.09/pr_711/19875', '/project/def-users/SHARED/jobs/2024.09/pr_711/20003'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.09.26

@eessi-build-deploy-bot-deucalion

PR merged! Moved [] to $HOME/trash_bin/EESSI/software-layer/2024.09.26

@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.09.26

@laraPPr laraPPr deleted the GPU_builds branch June 4, 2025 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants