Skip to content

Conversation

@laraPPr
Copy link
Collaborator

@laraPPr laraPPr commented Sep 25, 2024

@laraPPr laraPPr added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Sep 25, 2024
@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi.io-2023.06-compat

@boegel boegel changed the title {2023.06}[foss/2023a] NCCL 2.18.3 w/ CUDA 12.1.1 {2023.06}[foss/2023a] NCCL 2.18.3 w/ CUDA 12.1.1 (rebuild) Sep 25, 2024
@boegel
Copy link
Contributor

boegel commented Sep 25, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

@eessi-build-deploy-bot-deucalion
Copy link

eessi-build-deploy-bot-deucalion bot commented Sep 25, 2024

Updates by the bot instance boegel-bot-deucalion (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_741/19869

date job status comment
Sep 25 19:52:54 UTC 2024 submitted job id 19869 awaits release by job manager
Sep 25 19:53:02 UTC 2024 released job awaits launch by Slurm scheduler
Sep 25 20:02:04 UTC 2024 running job 19869 is running
Sep 25 20:27:32 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-19869.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1727295168.tar.gzsize: 64 MiB (67151359 bytes)
entries: 31
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Sep 25 20:27:32 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-19869.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 25 20:54:34 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1727295168.tar.gz to S3 bucket succeeded

@casparvl
Copy link
Collaborator

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

@eessi-build-deploy-bot-deucalion
Updates by the bot instance boegel-bot-deucalion (click for details)
  • account casparvl has NO permission to send commands to the bot

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_741/19870

date job status comment
Sep 25 20:21:51 UTC 2024 submitted job id 19870 awaits release by job manager
Sep 25 20:22:26 UTC 2024 released job awaits launch by Slurm scheduler
Sep 25 20:28:34 UTC 2024 running job 19870 is running
Sep 25 20:49:56 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-19870.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1727296683.tar.gzsize: 64 MiB (67151870 bytes)
entries: 31
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Sep 25 20:49:56 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-19870.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 25 20:54:54 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen3-1727296683.tar.gz to S3 bucket succeeded

@boegel
Copy link
Contributor

boegel commented Sep 25, 2024

Build for zen2 looks OK to me:

[bot@login1 19869]$ readelf -d 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1/lib/libnccl.so | grep RPATH | tr ':' '\n' | grep /CUDA/
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/CUDA/12.1.1/lib64

@boegel boegel added the bot:deploy Ask bot to deploy missing software installations to EESSI label Sep 25, 2024
Copy link
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Contributor

boegel commented Sep 25, 2024

staging PRs merged

@boegel
Copy link
Contributor

boegel commented Sep 25, 2024

ingestion done

@boegel boegel merged commit 8ed96cc into EESSI:2023.06-software.eessi.io Sep 25, 2024
35 checks passed
@eessi-build-deploy-bot-deucalion

PR merged! Moved [] to $HOME/trash_bin/EESSI/software-layer/2024.09.25

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

PR merged! Moved ['/project/def-users/SHARED/jobs/2024.09/pr_741/19869', '/project/def-users/SHARED/jobs/2024.09/pr_741/19870'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.09.25

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.09.25

@laraPPr laraPPr deleted the GPU_rebuilds branch June 4, 2025 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants