-
Notifications
You must be signed in to change notification settings - Fork 936
Closed
Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
ompi master
ucx master
prrte master
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
[ompi]$ git submodule status
952a986999027667b4d83bd257ea0efd9f908520 3rd-party/openpmix (v1.1.3-2485-g952a986)
545863e6dc055233456116da6dc85be2b307f8e2 3rd-party/prrte (dev-30707-g545863e)
Please describe the system on which you are running
- Operating system/version: RH8.2
- Computer hardware: ppc64le
- Network type: IB
Details of the problem
Here is the simple recreating test case:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
MPI_Datatype ddt;
MPI_Init(&argc, &argv);
MPI_Type_contiguous(0, MPI_INT, &ddt);
MPI_Type_commit(&ddt);
MPI_Sendrecv(NULL, 1, ddt, 0, 0,
NULL, 1, ddt, 0, 0,
MPI_COMM_SELF, MPI_STATUS_IGNORE);
MPI_Type_free(&ddt);
MPI_Finalize();
return 0;
}
The above program SEGFAULTs when run with ucx.
Here is the backtrace:
(gdb) bt
#0 0x00002000003fb7f4 in __memcpy_power7 () from /lib64/libc.so.6
#1 0x0000200002733f04 in uct_am_short_fill_data (length=1, payload=0x0, header=1, buffer=0x2020f880) at /nfs_smpi_ci/abd/os/ucx/src/uct/base/uct_iface.h:695
#2 uct_self_ep_am_short (tl_ep=0x201daa20, id=2 '\002', header=1, payload=0x0, length=1) at sm/self/self.c:259
#3 0x00002000026aa834 in uct_ep_am_short (length=1, payload=0x0, header=1, id=2 '\002', ep=0x201daa20) at /nfs_smpi_ci/abd/os/ucx/src/uct/api/uct.h:2608
#4 ucp_tag_send_inline (tag=1, length=1, buffer=0x0, ep=0x200002520000) at tag/tag_send.c:163
#5 ucp_tag_send_nbx (ep=0x200002520000, buffer=0x0, count=1, tag=1, param=0x7fffe5648ac0) at tag/tag_send.c:258
#6 0x00002000025a9d14 in mca_pml_ucx_send_nbr (tag=1, datatype=0x201ddf30, count=1, buf=0x0, ep=0x200002520000) at pml_ucx.c:899
#7 mca_pml_ucx_send (buf=0x0, count=1, datatype=0x201ddf30, dst=0, tag=0, mode=MCA_PML_BASE_SEND_STANDARD, comm=0x2000002e2240 <ompi_mpi_comm_self>) at pml_ucx.c:946
#8 0x00002000001b7358 in PMPI_Sendrecv (sendbuf=0x0, sendcount=1, sendtype=0x201ddf30, dest=0, sendtag=0, recvbuf=0x0, recvcount=1, recvtype=0x201ddf30, source=0, recvtag=0,
comm=0x2000002e2240 <ompi_mpi_comm_self>, status=0x0) at psendrecv.c:91
#9 0x0000000010000aac in main (argc=1, argv=0x7fffe5649088) at 1.c:12
Looks like this patch fixes the issue:
#8105
Metadata
Metadata
Assignees
Labels
No labels