-
Notifications
You must be signed in to change notification settings - Fork 936
Closed
Description
Fails in OpenMPI 4.0.0 and 3.1.3, passes in OpenMPI 2.1.1 (and MPICH).
OpenMPI installed from 4.0.0 source tar. Ubuntu 18.04.2.
Source:
// cart_break.c
#include <mpi.h>
int main(int argc, char * argv[]){
MPI_Init(&argc, &argv);
MPI_Comm parent0, parent1, child0, child1;
int ndims2[3] = {2, 1, 1};
int ndims1[3] = {1, 1, 1};
int periods[3] = {1, 1, 1};
// parent 0 (works)
if (MPI_Cart_create(MPI_COMM_WORLD, 3, ndims2, periods, 0, &parent0) != MPI_SUCCESS) { return -1; }
// child0 from parent0 (works)
if(MPI_Cart_create(parent0, 3, ndims1, periods, 0, &child0) != MPI_SUCCESS) { return -1; }
// parent 1 (works)
if(MPI_Cart_create(MPI_COMM_WORLD, 3, ndims2, periods, 0, &parent1) != MPI_SUCCESS){ return -1; }
// child1 from parent1 (hangs in mpi4py, segfaults in c) passes if parent1 is replaced with either parent0 or MPI_COMM_WORLD
if(MPI_Cart_create(parent1, 3, ndims1, periods, 0, &child1) != MPI_SUCCESS) {return -1;}
// cleanup
if (child0 != MPI_COMM_NULL){ MPI_Comm_free(&child0); }
if (child1 != MPI_COMM_NULL){ MPI_Comm_free(&child1); }
if (parent0 != MPI_COMM_NULL){ MPI_Comm_free(&parent0); }
if (parent1 != MPI_COMM_NULL){ MPI_Comm_free(&parent1); }
MPI_Finalize();
return 0;
}To reproduce:
shell$ mpicc cart_break.c
shell$ mpirun -n 2 ./a.outTraceback:
[pypc-dm-07:10294] *** Process received signal ***
[pypc-dm-07:10294] Signal: Segmentation fault (11)
[pypc-dm-07:10294] Signal code: Address not mapped (1)
[pypc-dm-07:10294] Failing at address: 0x8
[pypc-dm-07:10294] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f34b01f6f20]
[pypc-dm-07:10294] [ 1] /home/wrs20/opt/openmpi-4.0.0/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x995)[0x7f349f3cf1f5]
[pypc-dm-07:10294] [ 2] /home/wrs20/opt/openmpi-4.0.0/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7f349eb0439f]
[pypc-dm-07:10294] [ 3] /home/wrs20/opt/openmpi-4.0.0/lib/openmpi/mca_btl_vader.so(+0x46c7)[0x7f349eb046c7]
[pypc-dm-07:10294] [ 4] /home/wrs20/opt/openmpi-4.0.0/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7f34afc1cc2c]
[pypc-dm-07:10294] [ 5] /home/wrs20/opt/openmpi-4.0.0/lib/libmpi.so.40(ompi_comm_nextcid+0x105)[0x7f34b05deb25]
[pypc-dm-07:10294] [ 6] /home/wrs20/opt/openmpi-4.0.0/lib/libmpi.so.40(ompi_comm_enable+0x39)[0x7f34b05dc0f9]
[pypc-dm-07:10294] [ 7] /home/wrs20/opt/openmpi-4.0.0/lib/libmpi.so.40(mca_topo_base_cart_create+0x1c4)[0x7f34b0683a54]
[pypc-dm-07:10294] [ 8] /home/wrs20/opt/openmpi-4.0.0/lib/libmpi.so.40(MPI_Cart_create+0x25f)[0x7f34b061299f]
[pypc-dm-07:10294] [ 9] a.out(+0xa68)[0x56198c604a68]
[pypc-dm-07:10294] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f34b01d9b97]
[pypc-dm-07:10294] [11] a.out(+0x84a)[0x56198c60484a]
[pypc-dm-07:10294] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
Many Thanks,
Will
Metadata
Metadata
Assignees
Labels
No labels