-
Notifications
You must be signed in to change notification settings - Fork 926
Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.0.4
And repeated with 2.0.3, 2.1.6, 3.1.6 and 4.0.5
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
4.0.4 release version, built on my laptop (no weird communication layer ;))
The other versions I can provide ompi_info
if requested.
$> ompi_info
Package: Open MPI nicpa@nicpa-dtu Distribution
Open MPI: 4.0.4
Open MPI repo revision: v4.0.4
Open MPI release date: Jun 10, 2020
Open RTE: 4.0.4
Open RTE repo revision: v4.0.4
Open RTE release date: Jun 10, 2020
OPAL: 4.0.4
OPAL repo revision: v4.0.4
OPAL release date: Jun 10, 2020
MPI API: 3.1.0
Ident string: 4.0.4
Prefix: /opt/gnu/9.3.0/openmpi/4.0.4
Configured architecture: x86_64-unknown-linux-gnu
Configure host: nicpa-dtu
Configured by: nicpa
Configured on: Fri Aug 28 14:06:37 CEST 2020
Configure host: nicpa-dtu
Configure command line: '--enable-mpi1-compatibility'
'--with-ucx=/opt/gnu/9.3.0/ucx/1.8.1'
'--without-verbs'
'--prefix=/opt/gnu/9.3.0/openmpi/4.0.4'
'--enable-orterun-prefix-by-default'
'--enable-mpirun-prefix-by-default'
'--with-hwloc=/opt/gnu/9.3.0/hwloc/2.2.0'
'--with-zlib=/opt/gnu/9.3.0/zlib/1.2.11'
'--enable-mpi-thread-multiple' '--enable-mpi-cxx'
Built by: nicpa
Built on: Fri Aug 28 14:20:13 CEST 2020
Built host: nicpa-dtu
C bindings: yes
C++ bindings: yes
Fort mpif.h: yes (all)
Fort use mpi: yes (full: ignore TKR)
Fort use mpi size: deprecated-ompi-info-value
Fort use mpi_f08: yes
Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
limitations in the gfortran compiler and/or Open
MPI, does not support the following: array
subsections, direct passthru (where possible) to
underlying Open MPI's C functionality
Fort mpi_f08 subarrays: no
Java bindings: no
Wrapper compiler rpath: runpath
C compiler: gcc
C compiler absolute: /opt/generic/gcc/9.3.0/bin/gcc
C compiler family name: GNU
C compiler version: 9.3.0
C++ compiler: g++
C++ compiler absolute: /opt/generic/gcc/9.3.0/bin/g++
Fort compiler: gfortran
Fort compiler abs: /opt/generic/gcc/9.3.0/bin/gfortran
Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
Fort 08 assumed shape: yes
Fort optional args: yes
Fort INTERFACE: yes
Fort ISO_FORTRAN_ENV: yes
Fort STORAGE_SIZE: yes
Fort BIND(C) (all): yes
Fort ISO_C_BINDING: yes
Fort SUBROUTINE BIND(C): yes
Fort TYPE,BIND(C): yes
Fort T,BIND(C,name="a"): yes
Fort PRIVATE: yes
Fort PROTECTED: yes
Fort ABSTRACT: yes
Fort ASYNCHRONOUS: yes
Fort PROCEDURE: yes
Fort USE...ONLY: yes
Fort C_FUNLOC: yes
Fort f08 using wrappers: yes
Fort MPI_SIZEOF: yes
C profiling: yes
C++ profiling: yes
Fort mpif.h profiling: yes
Fort use mpi profiling: yes
Fort use mpi_f08 prof: yes
C++ exceptions: no
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
OMPI progress: no, ORTE progress: yes, Event lib:
yes)
Sparse Groups: no
Internal debug support: no
MPI interface warnings: yes
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
dl support: yes
Heterogeneous support: no
mpirun default --prefix: yes
MPI_WTIME support: native
Symbol vis. support: yes
Host topology support: yes
IPv6 support: no
MPI1 compatibility: yes
MPI extensions: affinity, cuda, pcollreq
FT Checkpoint support: no (checkpoint thread: no)
C/R Enabled Debugging: no
MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
MPI_MAX_PORT_NAME: 1024
MPI_MAX_DATAREP_STRING: 128
Please describe the system on which you are running
- Operating system/version:
Debian + ScientificLinux - Computer hardware:
Intel(R) Xeon(R) CPU E5-2660 v3 + 8GB RAM - Network type:
self
Details of the problem
I am doing some data distribution with some very simple codes.
The problem is that these codes explodes in memory due to double posting same tags for same send/recv, meaning that the program just crashes due
to filling up memory space. This makes it difficult to debug tag issues.
I here show the C code (but I also tested the same implementation in Fortran
and the memory leak still occurs, not that I would have expected otherwise).
#include <mpi.h>
void dist() {
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// BUG: memory leak depends exactly on this size, twice the number, twice the memory allocated
// NB is actual array dimension
// N is number of duplicated tags send
const int NB = 1024*2*2*2*2*2*2*2;
const int N = 1024*2*2*2;
MPI_Request reqs[NB];
int dat = 0;
if ( rank == 0 )
MPI_Recv(&dat, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
for ( int i = 0 ; i < N ; i++ ) {
if ( rank == 0 ) {
MPI_Recv(&dat, 1, MPI_INT, 1, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
} else if ( rank == 1 ) {
MPI_Isend(&dat, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &reqs[i]);
}
}
if ( rank == 1 ) {
MPI_Send(&dat, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
MPI_Waitall(N, reqs, MPI_STATUSES_IGNORE);
}
}
int main(int argc, char *argv []) {
MPI_Init(&argc, &argv);
for ( int i = 0 ; i < 5000000 ; i++ ) {
dist();
}
MPI_Finalize();
}
This memory-leak shows itself depending on the number of same tags is sent. And it will just explode endlessly. The memory won't be recaptured when finishing the outer loop and proceeding with more care.
The easy solution is to use barriers and/or MPI_Ssend
to ensure no two communications exists for the same tag. But it is hard to debug these things when memory just explodes. So these seem to only affect buffer send's.