Skip to content

Conversation

@trz42
Copy link
Collaborator

@trz42 trz42 commented Apr 22, 2025

First part of apps originally built with EB 4.9.0

@trz42 trz42 added 2023.06-software.eessi.io 2023.06 version of software.eessi.io a64fx labels Apr 22, 2025
@eessi-bot
Copy link

eessi-bot bot commented Apr 22, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/intel/cascadelake, x86_64/intel/icelake, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-deucalion
Copy link

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

@eessi-bot
Copy link

eessi-bot bot commented Apr 22, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 22, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@trz42
Copy link
Collaborator Author

trz42 commented Apr 22, 2025

bot: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx

@eessi-bot
Copy link

eessi-bot bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-toprichard
Copy link

Updates by the bot instance rt-Grace-jr (click for details)
  • account trz42 has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 22, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx from trz42

    • expanded format: build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx
  • handling command build instance:eessi-bot-deucalion repository:eessi.io-2023.06-software architecture:aarch64/a64fx resulted in:

    • no jobs were submitted

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Apr 22, 2025

New job on instance eessi-bot-deucalion for CPU micro-architecture aarch64-a64fx for repository eessi.io-2023.06-software in job dir /home/eessibot/new-bot/jobs/2025.04/pr_1038/409320

date job status comment
Apr 22 19:45:29 UTC 2025 submitted job id 409320 awaits release by job manager
Apr 22 19:46:11 UTC 2025 released job awaits launch by Slurm scheduler
Apr 22 19:47:15 UTC 2025 running job 409320 is running
Apr 23 03:41:16 UTC 2025 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-409320.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-1745377388.tar.gzsize: 170 MiB (179124566 bytes)
entries: 16043
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
Cbc/2.10.11-foss-2023a.lua
Cgl/0.60.8-foss-2023a.lua
Clp/1.17.9-foss-2023a.lua
CoinUtils/2.11.10-GCC-12.3.0.lua
ESPResSo/4.2.1-foss-2023a.lua
GLPK/5.0-GCCcore-12.3.0.lua
GitPython/3.1.40-GCCcore-12.3.0.lua
HepMC3/3.2.6-GCC-12.3.0.lua
MPC/1.3.1-GCCcore-12.3.0.lua
MUMPS/5.6.1-foss-2023a-metis.lua
Osi/0.108.9-GCC-12.3.0.lua
PuLP/2.8.0-foss-2023a.lua
PyYAML/6.0-GCCcore-12.3.0.lua
Rivet/3.1.9-gompi-2023a-HepMC3-3.2.6.lua
YODA/1.9.9-GCC-12.3.0.lua
expecttest/0.1.5-GCCcore-12.3.0.lua
fastjet-contrib/1.053-gompi-2023a.lua
fastjet/3.4.2-gompi-2023a.lua
gmpy2/2.1.5-GCC-12.3.0.lua
libyaml/0.2.5-GCCcore-12.3.0.lua
networkx/3.1-gfbf-2023a.lua
pytest-flakefinder/1.1.0-GCCcore-12.3.0.lua
pytest-rerunfailures/12.0-GCCcore-12.3.0.lua
pytest-shard/0.1.2-GCCcore-12.3.0.lua
scikit-learn/1.3.1-gfbf-2023a.lua
siscone/3.0.6-GCCcore-12.3.0.lua
snakemake/8.4.2-foss-2023a.lua
sympy/1.12-gfbf-2023a.lua
wrapt/1.15.0-gfbf-2023a.lua
software under 2023.06/software/linux/aarch64/a64fx/software
Cbc/2.10.11-foss-2023a
Cgl/0.60.8-foss-2023a
Clp/1.17.9-foss-2023a
CoinUtils/2.11.10-GCC-12.3.0
ESPResSo/4.2.1-foss-2023a
GLPK/5.0-GCCcore-12.3.0
GitPython/3.1.40-GCCcore-12.3.0
HepMC3/3.2.6-GCC-12.3.0
MPC/1.3.1-GCCcore-12.3.0
MUMPS/5.6.1-foss-2023a-metis
Osi/0.108.9-GCC-12.3.0
PuLP/2.8.0-foss-2023a
PyYAML/6.0-GCCcore-12.3.0
Rivet/3.1.9-gompi-2023a-HepMC3-3.2.6
YODA/1.9.9-GCC-12.3.0
expecttest/0.1.5-GCCcore-12.3.0
fastjet-contrib/1.053-gompi-2023a
fastjet/3.4.2-gompi-2023a
gmpy2/2.1.5-GCC-12.3.0
libyaml/0.2.5-GCCcore-12.3.0
networkx/3.1-gfbf-2023a
pytest-flakefinder/1.1.0-GCCcore-12.3.0
pytest-rerunfailures/12.0-GCCcore-12.3.0
pytest-shard/0.1.2-GCCcore-12.3.0
scikit-learn/1.3.1-gfbf-2023a
siscone/3.0.6-GCCcore-12.3.0
snakemake/8.4.2-foss-2023a
sympy/1.12-gfbf-2023a
wrapt/1.15.0-gfbf-2023a
other under 2023.06/software/linux/aarch64/a64fx
2023.06/init/easybuild/eb_hooks.py
Apr 23 03:41:16 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 2/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 3/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 44236.8 MiB is needed
[ SKIP ] ( 4/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 44236.8 MiB is needed
[ SKIP ] ( 5/11) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) accodring to the current ReFrame configuration, but 44236.8 MiB is needed
[ OK ] ( 6/11) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:aarch64_a64fx+default
P: latency: 1.74 us (r:0, l:None, u:None)
[ OK ] ( 7/11) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:aarch64_a64fx+default
P: bandwidth: 8603.95 MB/s (r:0, l:None, u:None)
[ OK ] ( 8/11) EESSI_ESPRESSO_LJ_PARTICLES %module_name=ESPResSo/4.2.2-foss-2023b %scale=1_node /3370ce9a @BotBuildTests:aarch64_a64fx+default
P: perf: 0.01151 s/step (r:0, l:None, u:None)
[ OK ] ( 9/11) EESSI_ESPRESSO_LJ_PARTICLES %module_name=ESPResSo/4.2.2-foss-2023a %scale=1_node /ce9ec58a @BotBuildTests:aarch64_a64fx+default
P: perf: 0.01083 s/step (r:0, l:None, u:None)
[ OK ] (10/11) EESSI_ESPRESSO_LJ_PARTICLES %module_name=ESPResSo/4.2.1-foss-2023a %scale=1_node /a7cd00d1 @BotBuildTests:aarch64_a64fx+default
P: perf: 0.01065 s/step (r:0, l:None, u:None)
[ OK ] (11/11) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:aarch64_a64fx+default
P: perf: 580.034 timesteps/s (r:0, l:None, u:None)
[ PASSED ] Ran 6/11 test case(s) from 11 check(s) (0 failure(s), 5 skipped, 0 aborted)
Details
✅ job output file slurm-409320.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Apr 23 06:39:45 UTC 2025 uploaded transfer of eessi-2023.06-software-linux-aarch64-a64fx-1745377388.tar.gz to S3 bucket succeeded

@trz42 trz42 added ready-to-deploy Mark a PR as ready to deploy ready-to-review labels Apr 23, 2025
Copy link
Collaborator

@TopRichard TopRichard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eb_hooks.py is included in the tarball, so it has changed, likely as a result of merging PR #1034

@TopRichard TopRichard added ready-to-review and removed ready-to-review ready-to-deploy Mark a PR as ready to deploy labels Apr 23, 2025
@trz42 trz42 added the bot:deploy Ask bot to deploy missing software installations to EESSI label Apr 23, 2025
@eessi-bot-toprichard
Copy link

Label bot:deploy has been set by user trz42, but this person does not have permission to trigger deployments

@trz42
Copy link
Collaborator Author

trz42 commented Apr 23, 2025

The eb_hooks.py is included in the tarball, so it has changed, likely as a result of merging PR #1034

Good catch. Then we should verify if the eb_hooks.py in the tarball is newer or older than the one on /cvmfs

@trz42
Copy link
Collaborator Author

trz42 commented Apr 23, 2025

The eb_hooks.py is included in the tarball, so it has changed, likely as a result of merging PR #1034

Good catch. Then we should verify if the eb_hooks.py in the tarball is newer or older than the one on /cvmfs

It's good to ingest. Actually fixes an issue that was likely created by the last ingest (probably the TensorFlow PR #1034) where the eb_hooks.py was updated/adjusted after the package was built. See diff below (between version on /cvmfs and in the tarball) illustrates the change

$ diff -u /cvmfs/software.eessi.io/versions/2023.06/init/easybuild/eb_hooks.py 2023.06/init/easybuild/eb_hooks.py
--- /cvmfs/software.eessi.io/versions/2023.06/init/easybuild/eb_hooks.py	2025-04-20 18:18:22.000000000 +0100
+++ 2023.06/init/easybuild/eb_hooks.py	2025-04-22 20:47:23.000000000 +0100
@@ -133,7 +133,8 @@
     if memory_hungry_build or memory_hungry_build_a64fx:
         parallel = self.cfg['parallel']
         if cpu_target == CPU_TARGET_A64FX and self.name in ['TensorFlow']:
-            if parallel > 1:
+            # limit parallelism to 8, builds with 12 and 16 failed on Deucalion
+            if parallel > 8:
                 self.cfg['parallel'] = 8
                 msg = "limiting parallelism to %s (was %s) for %s on %s to avoid out-of-memory failures during building/testing"
                 print_msg(msg % (self.cfg['parallel'], parallel, self.name, cpu_target), log=self.log)

@trz42
Copy link
Collaborator Author

trz42 commented Apr 23, 2025

Staging PR merged, tarball ingested ...

@trz42 trz42 merged commit 2a4c8ae into EESSI:2023.06-software.eessi.io Apr 23, 2025
59 checks passed
@eessi-bot
Copy link

eessi-bot bot commented Apr 23, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.04.23

1 similar comment
@eessi-bot
Copy link

eessi-bot bot commented Apr 23, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.04.23

@eessi-bot-deucalion
Copy link

PR merged! Moved ['/home/eessibot/new-bot/jobs/2025.04/pr_1038/409320'] to /home/eessibot/new-bot/trash-bin/EESSI/software-layer/2025.04.23

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 23, 2025

PR merged! Moved [] to /scratch/gent/vo/002/gvo00211/SHARED/trash_bin/EESSI/software-layer/2025.04.23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2023.06-software.eessi.io 2023.06 version of software.eessi.io a64fx bot:deploy Ask bot to deploy missing software installations to EESSI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants