Skip to content

zero byte messages segfaults with ucx #8104

@AboorvaDevarajan

Description

@AboorvaDevarajan

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

ompi master
ucx master
prrte master

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

[ompi]$ git submodule status
 952a986999027667b4d83bd257ea0efd9f908520 3rd-party/openpmix (v1.1.3-2485-g952a986)
 545863e6dc055233456116da6dc85be2b307f8e2 3rd-party/prrte (dev-30707-g545863e)

Please describe the system on which you are running

  • Operating system/version: RH8.2
  • Computer hardware: ppc64le
  • Network type: IB

Details of the problem

Here is the simple recreating test case:

#include <stdio.h>

#include <mpi.h>

int main(int argc, char *argv[]) {
    MPI_Datatype ddt;

    MPI_Init(&argc, &argv);
    MPI_Type_contiguous(0, MPI_INT, &ddt);
    MPI_Type_commit(&ddt);
    MPI_Sendrecv(NULL, 1, ddt, 0, 0,
                 NULL, 1, ddt, 0, 0,
                 MPI_COMM_SELF, MPI_STATUS_IGNORE);
    MPI_Type_free(&ddt);
    MPI_Finalize();
    return 0;
}

The above program SEGFAULTs when run with ucx.
Here is the backtrace:

(gdb) bt
#0  0x00002000003fb7f4 in __memcpy_power7 () from /lib64/libc.so.6
#1  0x0000200002733f04 in uct_am_short_fill_data (length=1, payload=0x0, header=1, buffer=0x2020f880) at /nfs_smpi_ci/abd/os/ucx/src/uct/base/uct_iface.h:695
#2  uct_self_ep_am_short (tl_ep=0x201daa20, id=2 '\002', header=1, payload=0x0, length=1) at sm/self/self.c:259
#3  0x00002000026aa834 in uct_ep_am_short (length=1, payload=0x0, header=1, id=2 '\002', ep=0x201daa20) at /nfs_smpi_ci/abd/os/ucx/src/uct/api/uct.h:2608
#4  ucp_tag_send_inline (tag=1, length=1, buffer=0x0, ep=0x200002520000) at tag/tag_send.c:163
#5  ucp_tag_send_nbx (ep=0x200002520000, buffer=0x0, count=1, tag=1, param=0x7fffe5648ac0) at tag/tag_send.c:258
#6  0x00002000025a9d14 in mca_pml_ucx_send_nbr (tag=1, datatype=0x201ddf30, count=1, buf=0x0, ep=0x200002520000) at pml_ucx.c:899
#7  mca_pml_ucx_send (buf=0x0, count=1, datatype=0x201ddf30, dst=0, tag=0, mode=MCA_PML_BASE_SEND_STANDARD, comm=0x2000002e2240 <ompi_mpi_comm_self>) at pml_ucx.c:946
#8  0x00002000001b7358 in PMPI_Sendrecv (sendbuf=0x0, sendcount=1, sendtype=0x201ddf30, dest=0, sendtag=0, recvbuf=0x0, recvcount=1, recvtype=0x201ddf30, source=0, recvtag=0, 
    comm=0x2000002e2240 <ompi_mpi_comm_self>, status=0x0) at psendrecv.c:91
#9  0x0000000010000aac in main (argc=1, argv=0x7fffe5649088) at 1.c:12

Looks like this patch fixes the issue:
#8105

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions