-
Notifications
You must be signed in to change notification settings - Fork 66
Description
I ran into this issue that a simple mpi4py code would not run on a Magic Castle deployment with EESSI (though it works on the same system with the pilot):
[ocaisa@node1 ~]$ cat bcast.py
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.rank
if rank == 0:
data = {'a':1,'b':2,'c':3}
else:
data = None
data = comm.bcast(data, root=0)
print('rank %d : %s'% (rank,data))
[ocaisa@login1 ~]$ module purge
[ocaisa@login1 ~]$ echo $MODULEPATH
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/modules/all
[ocaisa@login1 ~]$ module use /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen3/modules/all
[ocaisa@login1 ~]$ module load SciPy-bundle/2021.05-foss-2021a
[ocaisa@login1 ~]$ mpirun -n 2 python bcast.py
rank 0 : {'a': 1, 'b': 2, 'c': 3}
rank 1 : {'a': 1, 'b': 2, 'c': 3}
[ocaisa@login1 ~]$ module purge
[ocaisa@login1 ~]$ module unuse /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen3/modules/all
[ocaisa@login1 ~]$ module load mpi4py
[ocaisa@login1 ~]$ module list
Currently Loaded Modules:
1) GCCcore/12.3.0 5) libpciaccess/0.17-GCCcore-12.3.0 9) UCX/1.14.1-GCCcore-12.3.0 13) OpenMPI/4.1.5-GCC-12.3.0 17) libffi/3.4.4-GCCcore-12.3.0
2) GCC/12.3.0 6) hwloc/2.9.1-GCCcore-12.3.0 10) libfabric/1.18.0-GCCcore-12.3.0 14) gompi/2023a 18) Python/3.11.3-GCCcore-12.3.0
3) numactl/2.0.16-GCCcore-12.3.0 7) OpenSSL/1.1 11) PMIx/4.2.4-GCCcore-12.3.0 15) Tcl/8.6.13-GCCcore-12.3.0 19) mpi4py/3.1.4-gompi-2023a
4) libxml2/2.11.4-GCCcore-12.3.0 8) libevent/2.1.12-GCCcore-12.3.0 12) UCC/1.2.0-GCCcore-12.3.0 16) SQLite/3.42.0-GCCcore-12.3.0
[ocaisa@login1 ~]$ mpirun -n 2 python bcast.py
login1.int.jetstream2.hpc-carpentry.org:rank0.python: Failed to get eth0 (unit 1) cpu set
login1.int.jetstream2.hpc-carpentry.org:rank0: PSM3 can't open nic unit: 1 (err=23)
login1.int.jetstream2.hpc-carpentry.org:rank1.python: Failed to get eth0 (unit 1) cpu set
login1.int.jetstream2.hpc-carpentry.org:rank1: PSM3 can't open nic unit: 1 (err=23)
login1.int.jetstream2.hpc-carpentry.org:rank1.python: Failed to get eth0 (unit 1) cpu set
login1.int.jetstream2.hpc-carpentry.org:rank1: PSM3 can't open nic unit: 1 (err=23)
login1.int.jetstream2.hpc-carpentry.org:rank0.python: Failed to get eth0 (unit 1) cpu set
login1.int.jetstream2.hpc-carpentry.org:rank0: PSM3 can't open nic unit: 1 (err=23)
(hanging)
It turns out this issue was already "solved" for an EasyBuild use case, which also resolved things for my case.
It raises the issue though that OpenMPI may need to be configured to work correctly on the host site (and indeed this was also raised in #1 ). @bartoldeman explained how they account for this in Compute Canada:
the way we solve this (for the soft.computecanada.ca stack) is to set an environment variable RSNT_INTERCONNECT using this logic in lmod:
function get_interconnect()
local posix = require "posix"
if posix.stat("/sys/module/opa_vnic","type") == 'directory' then
return "omnipath"
elseif posix.stat("/sys/module/ib_core","type") == 'directory' then
return "infiniband"
end
return "ethernet"
end
for "ethernet" we have:
OMPI_MCA_btl='^openib,ofi'
OMPI_MCA_mtl='^ofi'
OMPI_MCA_osc='^ucx'
OMPI_MCA_pml='^ucx'
so libfabric (OFI) isn't used by Open MPI which eliminates any use of PSM3 as well, it'll basically force Open MPI to use either the tcp or vader (shm) + self btl , with the ob1 pml, no runtime use of UCX nor OFI.
I'm not sure if EESSI still compiles Open MPI with support for openib, if not, the first one could be OMPI_MCA_btl='^ofi'
for "infiniband" it's:
OMPI_MCA_btl='^openib,ofi'
OMPI_MCA_mtl='^ofi'
to eliminate libfabric as well; Open MPI will use UCX through its priority mechanism.
Lastly for "omnipath" :
OMPI_MCA_btl='^openib'
OMPI_MCA_osc='^ucx'
OMPI_MCA_pml='^ucx'
where we do allow ofi though the priority mechanism will select the cm pml with the psm2 mtl
So basically:
- always exclude openib (the only use case we have for it is DDT, that's why it's compiled in)
- infiniband excludes libfabric
- omnipath excludes UCX
- ethernet excludes both libfabric and UCX
We set the envvars via a configuration file included in the module, specifically with a modluafooter in the easyconfig:
assert(loadfile("/cvmfs/soft.computecanada.ca/config/lmod/openmpi_custom.lua"))("4.1")