Skip to content

MPI_Comm_spawn and UCX #8385

@KyuzoR

Description

@KyuzoR

Background information

shell$ mpirun -V
mpirun (Open MPI) 4.0.4
shell$ ompi_info | grep -i ucx
  Configure command line: '--prefix=/project/dsi/apps/easybuild/software/OpenMPI/4.0.4-iccifort-2019.5.281' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--enable-mpirun-prefix-by-default' '--enable-shared' '--with-verbs' '--with-hwloc=/project/dsi/apps/easybuild/software/hwloc/2.2.0-GCCcore-8.3.0' '--with-ucx=/project/dsi/apps/easybuild/software/UCX/1.8.0-GCCcore-8.3.0'
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.4)
shell$ uname -or
3.10.0-1160.11.1.el7.x86_64 GNU/Linux
shell$ srun -V
slurm 20.02.5
shell$ ucx_info -u t -e
#
# UCP endpoint
#
#               peer: <no debug data>
#                 lane[0]:  2:self/memory md[2]           -> md[2]/self     am am_bw#0
#                 lane[1]:  8:rc_mlx5/mlx5_0:1 md[5]      -> md[5]/ib       rma_bw#0 wireup{ud_mlx5/mlx5_0:1}
#                 lane[2]: 13:cma/memory md[7]            -> md[7]/cma      rma_bw#1
#
#                tag_send: 0..<egr/short>..8185..<egr/bcopy>..8192..<rndv>..(inf)
#            tag_send_nbr: 0..<egr/short>..8185..<egr/bcopy>..262144..<rndv>..(inf)
#           tag_send_sync: 0..<egr/short>..8185..<egr/bcopy>..8192..<rndv>..(inf)
#
#                  rma_bw: mds [5] rndv_rkey_size 18
#

Details of the problem

C code: test_mpi.c

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

#define NUM_SPAWNS 2

int
main(int argc, char** argv)
{
    int errcodes[NUM_SPAWNS];
    MPI_Comm parentcomm, intercomm;
    
    MPI_Init(&argc, &argv);
    
    MPI_Comm_get_parent(&parentcomm);
    if (parentcomm == MPI_COMM_NULL) {
        int rank;
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        // problem here
        MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, NUM_SPAWNS, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, errcodes);
        printf("Parent %d\n", rank);
        MPI_Bcast(&rank, 1, MPI_INT, MPI_ROOT, intercomm);
    } else {
        int rank, parent_rank;
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Bcast(&parent_rank, 1, MPI_INT, 0, parentcomm);
        printf("Child %d of parent %d\n", rank, parent_rank);
    }
    fflush(stdout);
    
    MPI_Finalize();
    return 0;
}

Compile&Run

shell$ mpicc test_mpi.c -o test_mpi
shell$ srun -N 2 --ntasks-per-node 3 --pty /bin/bash -l
shell$ mpirun --map-by ppr:1:node --bind-to core --mca btl '^openib,uct' --mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 --report-bindings ./test_mpi

Sometime I get

[compute-5-1.local:45181] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/./.][]
[compute-5-2.local:41401] MCW rank 1 bound to socket 1[core 0[hwt 0]]: [][B/./.]
[compute-5-1.local:45181] MCW rank 0 bound to socket 0[core 2[hwt 0]]: [././B][]
[compute-5-1.local:45181] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/.][]
[compute-5-2.local:41401] MCW rank 0 bound to socket 1[core 1[hwt 0]]: [][./B/.]
[compute-5-2.local:41401] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [][././B]
[1611013241.802944] [compute-5-1:45191:0]         wireup.c:315  UCX  ERROR ep 0x2ac27172a048: no remote ep address for lane[1]->remote_lane[1]
Parent 0
[1611013241.803457] [compute-5-2:41405:0]         wireup.c:315  UCX  ERROR ep 0x2b070d728090: no remote ep address for lane[1]->remote_lane[1]
Child 0 of parent 0
Child 1 of parent 0
Parent 1
Child 0 of parent 1
Child 1 of parent 1
[1611013241.806914] [compute-5-2:41405:0]         wireup.c:315  UCX  ERROR ep 0x2b070d728048: no remote ep address for lane[1]->remote_lane[1]
[1611013241.819399] [compute-5-1:45191:0]         wireup.c:315  UCX  ERROR ep 0x2ac27172a090: no remote ep address for lane[1]->remote_lane[1]

which is, except for the four UCX ERROR lines, the expected output. But sometime I get

shell$ mpirun --map-by ppr:1:node --bind-to core --mca btl '^openib,uct' --mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 --report-bindings ./test_mpi
[compute-5-1.local:44689] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/./.][]
[compute-5-2.local:40943] MCW rank 1 bound to socket 1[core 0[hwt 0]]: [][B/./.]
[compute-5-1.local:44689] MCW rank 0 bound to socket 0[core 2[hwt 0]]: [././B][]
[compute-5-1.local:44689] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/.][]
[compute-5-2.local:40943] MCW rank 0 bound to socket 1[core 1[hwt 0]]: [][./B/.]
[compute-5-2.local:40943] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [][././B]
[compute-5-1.local:44699] pml_ucx.c:176  Error: Failed to receive UCX worker address: Not found (-13)
[compute-5-1.local:44699] [[57305,1],0] ORTE_ERROR_LOG: Error in file dpm/dpm.c at line 493
[compute-5-1:44699] *** An error occurred in MPI_Comm_spawn
[compute-5-1:44699] *** reported by process [3755540481,0]
[compute-5-1:44699] *** on communicator MPI_COMM_SELF
[compute-5-1:44699] *** MPI_ERR_OTHER: known error not in list
[compute-5-1:44699] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[compute-5-1:44699] ***    and potentially your MPI job)
[compute-5-2.local:40947] pml_ucx.c:176  Error: Failed to receive UCX worker address: Not found (-13)
[compute-5-2.local:40947] [[57305,1],1] ORTE_ERROR_LOG: Error in file dpm/dpm.c at line 493
[compute-5-1.local:44689] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2193
[compute-5-1.local:44689] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[compute-5-1.local:44689] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Same result even without specifying --mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions