-
Notifications
You must be signed in to change notification settings - Fork 925
Description
On master branch
I observe a strange behavior. I think that openib may be using too large of a hammer for numa membinding, possibly setting the wrong memory binding policy for the vader and sm shared memory segments. I've only come to this conclusion empirically based on performance numbers.
For example, I have a RHEL 6.5 node with a single Mellanox Technologies MT25204 [InfiniHost III Lx HCA] ConnectX-3 card with a single port active.
Bad Latency run single host:
$ mpirun -host "mpi03" -np 4 --bind-to core --report-bindings --mca btl openib,vader,self ./ping_pong_ring.x2
[mpi03:12941] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:12941] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[mpi03:12941] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:12941] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 7.11 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 7.10 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 7.15 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 7.17 usec/msg
Similar behavior with sm:
$ mpirun -host "mpi03" -np 4 --bind-to core --report-bindings --mca btl openib,sm,self ./ping_pong_ring.x2
[mpi03:14928] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:14928] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[mpi03:14928] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:14928] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 7.45 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 7.38 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 7.35 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 7.38 usec/msg
When I remove openib results look much better:
$ mpirun -host "mpi03" -np 4 --bind-to core --report-bindings --mca btl vader,self ./ping_pong_ring.x2
[mpi03:15819] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:15819] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[mpi03:15819] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:15819] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.49 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.51 usec/msg
Similar behavior with sm (though it's half as fast as vader):
$ mpirun -host "mpi03" -np 4 --bind-to core --report-bindings --mca btl sm,self ./ping_pong_ring.x2
[mpi03:16608] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:16608] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[mpi03:16608] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:16608] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.98 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 1.00 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.95 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.93 usec/msg
If I disable binding explicitly with --bind-to none, even when specifying openib I see the expected results (with either vader or sm, but now sm is the same speed as vader... weird):
$ mpirun -host "mpi03" -np 4 --bind-to none --report-bindings --mca btl openib,vader,self ./ping_pong_ring.x2
[mpi03:20206] MCW rank 1 is not bound (or bound to all available processors)
[mpi03:20205] MCW rank 0 is not bound (or bound to all available processors)
[mpi03:20207] MCW rank 2 is not bound (or bound to all available processors)
[mpi03:20208] MCW rank 3 is not bound (or bound to all available processors)
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.49 usec/msg
$ mpirun -host "mpi03" -np 4 --bind-to none --report-bindings --mca btl openib,sm,self ./ping_pong_ring.x2
[mpi03:21058] MCW rank 0 is not bound (or bound to all available processors)
[mpi03:21059] MCW rank 1 is not bound (or bound to all available processors)
[mpi03:21060] MCW rank 2 is not bound (or bound to all available processors)
[mpi03:21061] MCW rank 3 is not bound (or bound to all available processors)
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.50 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 0.51 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.51 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.49 usec/msg
Finally just for completeness... the best 0 byte ping pong ring times I could get was with --bind-to core --map-by core:
$ mpirun -host "mpi03" -np 4 --bind-to core --map-by core --report-bindings --mca btl vader,self ./ping_pong_ring.x2
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1
[mpi03:32149] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[mpi03:32149] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[mpi03:32149] MCW rank 2 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..]
[mpi03:32149] MCW rank 3 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
[0:mpi03] ping-pong 0 bytes ...
0 bytes: 0.37 usec/msg
[1:mpi03] ping-pong 0 bytes ...
0 bytes: 0.37 usec/msg
[2:mpi03] ping-pong 0 bytes ...
0 bytes: 0.38 usec/msg
[3:mpi03] ping-pong 0 bytes ...
0 bytes: 0.38 usec/msg
I've attached my source for ping_pong_ring.c: