Skip to content

Conversation

@TopRichard
Copy link
Collaborator

@TopRichard TopRichard commented Apr 23, 2025

This PR should not be merged until NVIDIA/Grace stack is ready

@eessi-bot
Copy link

eessi-bot bot commented Apr 23, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/intel/cascadelake, x86_64/intel/icelake, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot
Copy link

eessi-bot bot commented Apr 23, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 23, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-deucalion
Copy link

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat

@TopRichard TopRichard marked this pull request as draft April 23, 2025 18:01
@TopRichard TopRichard marked this pull request as ready for review April 24, 2025 13:21
Richard Top added 2 commits April 24, 2025 14:26
…ftware-layer into eessi-2023.06-grace-Add-support-in-archdetect-for-detecting-NVIDIA/Grace
…e is also actually used by updating Google Actions workflow
Copy link
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TopRichard The extra test case for nvidia/grace is only used when the corresponding GitHub Actions workflow is also updated.

Also, it's worth also adding a test case for google/axion.

Both tackled in TopRichard#17

Richard Top added 2 commits April 25, 2025 06:52
…are-layer into eessi-2023.06-grace-Add-support-in-archdetect-for-detecting-NVIDIA/Grace
Copy link
Contributor

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to go for me now.

@trz42 Can you double check, and perhaps manually test on both grace and axion?
Should be fine though, since it's verified through the extra test cases.

We'll also need to trigger a deploy before merging.

@trz42
Copy link
Collaborator

trz42 commented Apr 25, 2025

Looks good to go for me now.

@trz42 Can you double check, and perhaps manually test on both grace and axion? Should be fine though, since it's verified through the extra test cases.

We'll also need to trigger a deploy before merging.

Works on Google Axion

[@google-axion software-layer]$ cat /proc/cpuinfo | awk '/^processor/{p++} p==1' && init/eessi_archdetect.sh -a cpupath
processor	: 0
BogoMIPS	: 2000.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd4f
CPU revision	: 1

aarch64/google/axion:aarch64/nvidia/grace:aarch64/neoverse_v1:aarch64/neoverse_n1:aarch64/generic

and also on NVIDIA Grace/Hopper

[jrc0900 software-layer]$ cat /proc/cpuinfo | awk '/^processor/{p++} p==1' && init/eessi_archdetect.sh -a cpupath
processor	: 0
BogoMIPS	: 2000.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd4f
CPU revision	: 0

aarch64/nvidia/grace:aarch64/neoverse_v1:aarch64/neoverse_n1:aarch64/generic

@TopRichard
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic

@eessi-bot
Copy link

eessi-bot bot commented Apr 25, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic resulted in:

@eessi-bot
Copy link

eessi-bot bot commented Apr 25, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic from TopRichard
    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 25, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic resulted in:

    • no jobs were submitted

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 25, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic resulted in:

    • account TopRichard has NO permission to submit build jobs

@eessi-bot-toprichard
Copy link

eessi-bot-toprichard bot commented Apr 25, 2025

Updates by the bot instance rt-Grace-jr (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/generic resulted in:

    • no jobs were submitted

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 25, 2025

Label bot:build has been set by user TopRichard, but this person does not have permission to trigger builds

@eessi-bot
Copy link

eessi-bot bot commented Apr 25, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.04/pr_1042/58919

date job status comment
Apr 25 07:35:04 UTC 2025 submitted job id 58919 awaits release by job manager
Apr 25 07:35:16 UTC 2025 released job awaits launch by Slurm scheduler
Apr 25 07:41:32 UTC 2025 running job 58919 is running
Apr 25 07:47:57 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-58919.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-1745566900.tar.gzsize: 0 MiB (15682 bytes)
entries: 2
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/init/arch_specs/eessi_arch_arm.spec
2023.06/init/easybuild/eb_hooks.py
Apr 25 07:47:57 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86_64_generic+default
P: perf: 415.155 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86_64_generic+default
P: perf: 424.041 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86_64_generic+default
P: latency: 3.22 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86_64_generic+default
P: latency: 7.59 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86_64_generic+default
P: latency: 5.74 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86_64_generic+default
P: latency: 5.66 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86_64_generic+default
P: latency: 0.62 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86_64_generic+default
P: latency: 0.7 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86_64_generic+default
P: bandwidth: 10567.94 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86_64_generic+default
P: bandwidth: 10731.48 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-58919.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 25 08:44:02 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-x86_64-generic-1745566900.tar.gz to S3 bucket succeeded

Copy link
Collaborator

@bedroge bedroge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the tarball includes eb_hooks.py as we forgot to deploy that in #1046.

@bedroge bedroge added bot:deploy Ask bot to deploy missing software installations to EESSI aarch64 related to Arm 64-bit targets (aarch64) 2023.06-software.eessi.io 2023.06 version of software.eessi.io grace NVIDIA Grace CPU labels Apr 25, 2025
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user bedroge, but this person does not have permission to trigger deployments

@bedroge
Copy link
Collaborator

bedroge commented Apr 25, 2025

Tarball has been ingested.

@bedroge bedroge merged commit 9c62fde into EESSI:2023.06-software.eessi.io Apr 25, 2025
61 checks passed
@eessi-bot
Copy link

eessi-bot bot commented Apr 25, 2025

PR merged! Moved ['/project/def-users/SHARED/jobs/2025.04/pr_1042/58919'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.04.25

@eessi-bot
Copy link

eessi-bot bot commented Apr 25, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.04.25

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 25, 2025

PR merged! Moved [] to /scratch/gent/vo/002/gvo00211/SHARED/trash_bin/EESSI/software-layer/2025.04.25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2023.06-software.eessi.io 2023.06 version of software.eessi.io aarch64 related to Arm 64-bit targets (aarch64) bot:deploy Ask bot to deploy missing software installations to EESSI grace NVIDIA Grace CPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants