-
Notifications
You must be signed in to change notification settings - Fork 925
Closed
Description
- Open MPI: 4.0.1
- Operating system/version: centos7.6
- Computer hardware: Intel Skylake
- Network type: Mellanox EDR InfiniBand (although not used in steps to reproduce)
I'm opening this ticket because HDF5 1.8.21 hangs on its parallel test suite with OpenMPI 4.0.1 when run on our Lustre 2.12.2 parallel filesystem. They run to completion when run on an ext4 filesystem.
The following script reproduces both the openmpi/hdf5 build and the issue with the test suite. Can someone help, please?
Thanks,
Mark
#!/bin/bash
# We have run this on a centos7.6 system with an idle Lustre 2.12.2
# filesystem mounted. Same Lustre version on both client and servers.
#
# If we run the script from a location on the Lustre filesystem,
# the hdf5 test "testphdf5" hangs until it is terminated by its 20
# minute alarm. Increasing the alarm timout makes no difference
# (it's a new, idle filesystem). The last lines that the test
# printed was 6 copies of this:
#
# Testing -- multi-chunk collective chunk io (cchunk3)
#
# If we run the script from a location on an ext4 filesystem,
# the hdf5 test "testphdf5" completes (although fails hdf5 1.8.21's
# t_pflush1 test, which I believe is a known, separate issue)
set -x
set -e
# (needed on our system to ensure we are using the OS-provided
# version of GCC, etc.)
module purge || true
test -d build || mkdir build
test -d src || mkdir src
prefix=`pwd`/build
export PATH=${prefix}/bin:$PATH
cd src
# openmpi
wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.1.tar.bz2
tar xf openmpi-4.0.1.tar.bz2
cd openmpi-4.0.1
# (we get the same test behaviour without "--with-io-romio-flags"; however,
# OpenMPI's ROMIO on Lustre is clearly broken without it - trying to use an
# MPI-IO hint to stripe a file didn't work)
./configure --prefix=$prefix \
--with-io-romio-flags=--with-file-system=lustre+ufs \
--enable-mpi1-compatibility
make -j12
make install
cd ..
# (ignore infiniband for now if we have it - hdf5 tests only need one host.
# This avoids some warning messages we can ignore)
export OMPI_MCA_btl=^openib
# (disable openmpi's own MPI-IO implementation - recommended by the hdf5
# folks. I believe openmpi defaults to ROMIO on Lustre anyway)
export OMPI_MCA_io=^mpio
# hdf5
wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8/hdf5-1.8.21/src/hdf5-1.8.21.tar.bz2
tar xf hdf5-1.8.21.tar.bz2
cd hdf5-1.8.21
export CC=mpicc
export CXX=mpicxx
export FC=mpif90
export F77=mpif77
export F90=mpif90
./configure --prefix=$prefix --enable-parallel
make -j12
make check
make install
cd ..
Metadata
Metadata
Assignees
Labels
No labels