Skip to content

Conversation

@markalle
Copy link
Contributor

@markalle markalle commented Jun 7, 2017

Example (using MPI_ORDER_C so the below has 6 rows of 4 ints to parcel out)

    size = 4;
    rank = 0;
    ndims=2;
    gsizes[0] = 6;
    gsizes[1] = 4;
    distribs[0] = MPI_DISTRIBUTE_CYCLIC;
    distribs[1] = MPI_DISTRIBUTE_BLOCK;
    dargs[0] = 2;
    dargs[1] = 2;
    psizes[0] = 2;
    psizes[1] = 2;
    MPI_Type_create_darray(size, rank, ndims,
        gsizes, distribs, dargs, psizes,
        MPI_ORDER_C, MPI_INT, &mydt);

Expectation for the layout:

   inner dimension (1) is
       4 items (ints) distributed block over 2 ranks with 2 items each
       eg for rank 0: [ x x . . ]
   outer dimension (0) is:
       6 items (the above [ x x . .]) cyclic over 2 ranks with 2 items each
       eg for rank 0:
           [ x x . . ]    :  offset=0 bytes=8
           [ x x . . ]    :  ofset=16 bytes=8
           [ . . . . ]
           [ . . . . ]
           [ x x . . ]    :  offset=64 bytes=8
           [ x x . . ]    :  offset=80 bytes=8

Or more specifically a stream of ints 0,1,2,3,4,5,6,7 sent into that
type should be

    [ 0 1 . . ]
    [ 2 3 . . ]
    [ . . . . ]
    [ . . . . ]
    [ 4 5 . . ]
    [ 6 7 . . ]
The data was laying out though as
    [ 0 1 2 3 ]
    [ . . . . ]
    [ . . . . ]
    [ . . . . ]
    [ 4 5 6 7 ]
    [ . . . . ]

because the recursive construction inside the block() function (which
creates the smaller row datatype [ x x . . ]) wasn't setting the extent
of that type.

Signed-off-by: Mark Allen [email protected]

Example (using MPI_ORDER_C so the below has 6 rows of 4 ints to parcel out)
    size = 4;
    rank = 0;
    ndims=2;
    gsizes[0] = 6;
    gsizes[1] = 4;
    distribs[0] = MPI_DISTRIBUTE_CYCLIC;
    distribs[1] = MPI_DISTRIBUTE_BLOCK;
    dargs[0] = 2;
    dargs[1] = 2;
    psizes[0] = 2;
    psizes[1] = 2;
    MPI_Type_create_darray(size, rank, ndims,
        gsizes, distribs, dargs, psizes,
        MPI_ORDER_C, MPI_INT, &mydt);

Expectation for the layout:
   inner dimension (1) is
       4 items (ints) distributed block over 2 ranks with 2 items each
       eg for rank 0: [ x x . . ]
   outer dimension (0) is:
       6 items (the above [ x x . .]) cyclic over 2 ranks with 2 items each
       eg for rank 0:
           [ x x . . ]    :  offset=0 bytes=8
           [ x x . . ]    :  ofset=16 bytes=8
           [ . . . . ]
           [ . . . . ]
           [ x x . . ]    :  offset=64 bytes=8
           [ x x . . ]    :  offset=80 bytes=8

Or more specifically a stream of ints 0,1,2,3,4,5,6,7 sent into that
type should be
    [ 0 1 . . ]
    [ 2 3 . . ]
    [ . . . . ]
    [ . . . . ]
    [ 4 5 . . ]
    [ 6 7 . . ]
The data was laying out though as
    [ 0 1 2 3 ]
    [ . . . . ]
    [ . . . . ]
    [ . . . . ]
    [ 4 5 6 7 ]
    [ . . . . ]
because the recursive construction inside the block() function (which
creates the smaller row datatype [ x x . . ]) wasn't setting the extent
of that type.

Signed-off-by: Mark Allen <[email protected]>
@markalle
Copy link
Contributor Author

markalle commented Jun 7, 2017

Gist of testcase:
https://gist.github.com/markalle/1b2108e57a0a1fdf7ddc8ba7554be0b5
mpicc -o x darray_bug_ompi.c
mpirun -np 1 ./x

This fix adds code to the block() type creation function that matches code that's already in the cyclic() type creation function.

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.

@markalle
Copy link
Contributor Author

markalle commented Jun 8, 2017

There was something else I was wondering. Functionally I don't think there's any need for the block() function at all, I think it's just a degenerate case of the more general cyclic() function. Is there something about block() that's more efficient than just letting cyclic() run with a darg value that makes it behave equivalently to block()?

I don't have any objection to the code as is, just wondering why it was done like that.

@gpaulsen gpaulsen merged commit bdc7206 into open-mpi:master Jun 8, 2017
@bosilca
Copy link
Member

bosilca commented Jun 8, 2017

The block() contains some optimizations that are not in the cyclic case. I know the MPI standard talks only in terms of cyclic, but some optimizations were possible and they have been implemented in the block() part. I do not recall how important they were.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants