-
Notifications
You must be signed in to change notification settings - Fork 924
Closed
Description
This problem was originally treated as an btl/openib
issue: #3890.
However more detailed investigation indicating that this is an effect of coll/hcoll
component: #4082
Without hcoll it runs ok:
$ bash -x ./run.sh
+ ./mpirun -np 8 -bind-to none -mca orte_tmpdir_base /tmp/tmp.8mj45mghXh -mca btl_openib_if_include mlx5_0:1 -x MXM_RDMA_PORTS=mlx5_0:1 -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_TLS=rc,cm -mca pml ob1 -mca btl self,openib -mca coll '^hcoll' -mca btl_openib_receive_queues X,4096,1024:X,12288,512:X,65536,512 taskset -c 6,7 /hpc/home/USERS/artemp/scrap/OMPI/ompi/examples/hello_c
Hello, world, I am 2 of 8, (Open MPI v2.1.2rc1, package: Open MPI artemp@jenkins03 Distribution, ident: 2.1.2rc1, repo rev: v2.1.1-154-g459e5ae, Unreleased developer copy, 143)
Hello, world, I am 7 of 8, (Open MPI v2.1.2rc1, package: Open MPI artemp@jenkins03 Distribution, ident: 2.1.2rc1, repo rev: v2.1.1-154-g459e5ae, Unreleased developer copy, 143)
Hello, world, I am 4 of 8, (Open MPI v2.1.2rc1, package: Open MPI artemp@jenkins03 Distribution, ident: 2.1.2rc1, repo rev: v2.1.1-154-g459e5ae, Unreleased developer copy, 143)
Hello, world, I am 6 of 8, (Open MPI v2.1.2rc1, package: Open MPI artemp@jenkins03 Distribution, ident: 2.1.2rc1, repo rev: v2.1.1-154-g459e5ae, Unreleased developer copy, 143)
Hello, world, I am 3 of 8, (Open MPI v2.1.2rc1, package: Open MPI artemp@jenkins03 Distribution, ident: 2.1.2rc1, repo rev: v2.1.1-154-g459e5ae, Unreleased developer copy, 143)
Hello, world, I am 5 of 8, (Open MPI v2.1.2rc1, package: Open MPI artemp@jenkins03 Distribution, ident: 2.1.2rc1, repo rev: v2.1.1-154-g459e5ae, Unreleased developer copy, 143)
Hello, world, I am 0 of 8, (Open MPI v2.1.2rc1, package: Open MPI artemp@jenkins03 Distribution, ident: 2.1.2rc1, repo rev: v2.1.1-154-g459e5ae, Unreleased developer copy, 143)
Hello, world, I am 1 of 8, (Open MPI v2.1.2rc1, package: Open MPI artemp@jenkins03 Distribution, ident: 2.1.2rc1, repo rev: v2.1.1-154-g459e5ae, Unreleased developer copy, 143)
+ exit 0
While enabling hcoll introduces the problem:
$ bash -x ./run.sh
+ ./mpirun -np 8 -bind-to none -mca orte_tmpdir_base /tmp/tmp.8mj45mghXh -mca btl_openib_if_include mlx5_0:1 -x MXM_RDMA_PORTS=mlx5_0:1 -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_TLS=rc,cm -mca pml ob1 -mca btl self,openib -mca btl_openib_receive_queues X,4096,1024:X,12288,512:X,65536,512 taskset -c 6,7 /hpc/home/USERS/artemp/scrap/OMPI/ompi/examples/hello_c
[1502826187.186194] [jenkins03:21885:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.45
[1502826187.188457] [jenkins03:21888:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.45
[1502826187.192728] [jenkins03:21886:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.45
[1502826187.194526] [jenkins03:21884:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.45
[1502826187.200721] [jenkins03:21887:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.45
[1502826187.206316] [jenkins03:21890:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.45
[1502826187.209630] [jenkins03:21889:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.45
[1502826187.215512] [jenkins03:21891:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3499.45
+ echo 255
255
Metadata
Metadata
Assignees
Labels
No labels