Skip to content

Memory leak with multiple equivalent tag pt2pt communications #8561

@zerothi

Description

@zerothi

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

4.0.4

And repeated with 2.0.3, 2.1.6, 3.1.6 and 4.0.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

4.0.4 release version, built on my laptop (no weird communication layer ;))
The other versions I can provide ompi_info if requested.

$> ompi_info
                 Package: Open MPI nicpa@nicpa-dtu Distribution
                Open MPI: 4.0.4
  Open MPI repo revision: v4.0.4
   Open MPI release date: Jun 10, 2020
                Open RTE: 4.0.4
  Open RTE repo revision: v4.0.4
   Open RTE release date: Jun 10, 2020
                    OPAL: 4.0.4
      OPAL repo revision: v4.0.4
       OPAL release date: Jun 10, 2020
                 MPI API: 3.1.0
            Ident string: 4.0.4
                  Prefix: /opt/gnu/9.3.0/openmpi/4.0.4
 Configured architecture: x86_64-unknown-linux-gnu
          Configure host: nicpa-dtu
           Configured by: nicpa
           Configured on: Fri Aug 28 14:06:37 CEST 2020
          Configure host: nicpa-dtu
  Configure command line: '--enable-mpi1-compatibility'
                          '--with-ucx=/opt/gnu/9.3.0/ucx/1.8.1'
                          '--without-verbs'
                          '--prefix=/opt/gnu/9.3.0/openmpi/4.0.4'
                          '--enable-orterun-prefix-by-default'
                          '--enable-mpirun-prefix-by-default'
                          '--with-hwloc=/opt/gnu/9.3.0/hwloc/2.2.0'
                          '--with-zlib=/opt/gnu/9.3.0/zlib/1.2.11'
                          '--enable-mpi-thread-multiple' '--enable-mpi-cxx'
                Built by: nicpa
                Built on: Fri Aug 28 14:20:13 CEST 2020
              Built host: nicpa-dtu
              C bindings: yes
            C++ bindings: yes
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the gfortran compiler and/or Open
                          MPI, does not support the following: array
                          subsections, direct passthru (where possible) to
                          underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /opt/generic/gcc/9.3.0/bin/gcc
  C compiler family name: GNU
      C compiler version: 9.3.0
            C++ compiler: g++
   C++ compiler absolute: /opt/generic/gcc/9.3.0/bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /opt/generic/gcc/9.3.0/bin/gfortran
         Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
           C++ profiling: yes
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: yes
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
      MPI1 compatibility: yes
          MPI extensions: affinity, cuda, pcollreq
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128

Please describe the system on which you are running

  • Operating system/version:
    Debian + ScientificLinux
  • Computer hardware:
    Intel(R) Xeon(R) CPU E5-2660 v3 + 8GB RAM
  • Network type:
    self

Details of the problem

I am doing some data distribution with some very simple codes.

The problem is that these codes explodes in memory due to double posting same tags for same send/recv, meaning that the program just crashes due
to filling up memory space. This makes it difficult to debug tag issues.

I here show the C code (but I also tested the same implementation in Fortran
and the memory leak still occurs, not that I would have expected otherwise).

#include <mpi.h>

void dist() {
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  // BUG: memory leak depends exactly on this size, twice the number, twice the memory allocated
  // NB is actual array dimension
  // N is number of duplicated tags send
  const int NB = 1024*2*2*2*2*2*2*2;
  const int N = 1024*2*2*2;
  MPI_Request reqs[NB];

  int dat = 0;
  if ( rank == 0 )
    MPI_Recv(&dat, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

  for ( int i = 0 ; i < N ; i++ ) {
    if ( rank == 0 ) {
      MPI_Recv(&dat, 1, MPI_INT, 1, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    } else if ( rank == 1 ) {
      MPI_Isend(&dat, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &reqs[i]);
    }
  }

  if ( rank == 1 ) {
    MPI_Send(&dat, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
    MPI_Waitall(N, reqs, MPI_STATUSES_IGNORE);
  }
  
}

int main(int argc, char *argv []) {
  MPI_Init(&argc, &argv);

  for ( int i = 0 ; i < 5000000 ; i++ ) {
    dist();
  }

  MPI_Finalize();
}

This memory-leak shows itself depending on the number of same tags is sent. And it will just explode endlessly. The memory won't be recaptured when finishing the outer loop and proceeding with more care.

The easy solution is to use barriers and/or MPI_Ssend to ensure no two communications exists for the same tag. But it is hard to debug these things when memory just explodes. So these seem to only affect buffer send's.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions