-
Notifications
You must be signed in to change notification settings - Fork 936
make Type_create_resized set FLAG_USER_UB #8092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The difference between MPI_UB and MPI_Type_create_resized is that the second marked is not hard, which means you don't propagate it to the types created from the resized type. I do not have access to the standard right now to point you to the place where this is described, but I'm sure it is. You can also look at the equations describing how the soft lb and ub markers generated by MPI_Type_create_resized propagate. |
|
@vspetrov - can you please review. |
|
Thanks for checking it out @boslica . Is that soft/hard distinction part of the old MPI standards or is that from a new draft? So far I'm not finding that exception for propagating the UB marker coming from Type_create_resized in the standard. The example they give when defining the LB/UB markers even shows it being propagated like any other member of the typemap: If I run the example straight from the standard vanilla OMPI for some reason is coming up with 20 for the extent here, but the extent would be 18 according to those UB/LB markers. |
In the below type creation sequence
MPI_Type_create_resized(MPI_INT, 0, 6, &mydt1);
MPI_Type_contiguous(1, mydt1, &mydt2);
I think both mydt1 and mydt2 should have extent 6.
The Type_create_resized would add an UB marker into the type map,
and the definition of Type_contiguous would maintain the same
markers in the new map.
The only counter argument I can think of to the above is if
we declared that mydt1 is illegal because it's putting data
on addresses that don't satisfy the alignment requirement.
But in my interpretation of the standard the term "alignment
requirement" is a property of the system memory, and MPI defines
"extent" in a way to make it easy to create MPI datatypes that
support the system's alignment requirements. But the standard
isn't saying it's illegal to make MPI datatypes that don't satisfy
the system's alignment requirements. I think this is true also
because the MPI datatypes might be used in file IO where the
requirements are different, so that's my long winded explanation
for why I don't think we can declare mydt1 illegal.
Complete example:
#include <stdio.h>
#include <mpi.h>
int main() {
MPI_Datatype mydt1, mydt2;
MPI_Aint lb, ext;
MPI_Init(0, 0);
MPI_Type_create_resized(MPI_INT, 0, 6, &mydt1);
MPI_Type_commit(&mydt1);
MPI_Type_contiguous(1, mydt1, &mydt2);
MPI_Type_commit(&mydt2);
MPI_Type_get_extent(mydt1, &lb, &ext);
printf("mydt1 extent %d\n", (int)ext);
MPI_Type_get_extent(mydt2, &lb, &ext);
printf("mydt2 extent %d\n", (int)ext);
MPI_Type_free(&mydt1);
MPI_Type_free(&mydt2);
MPI_Finalize();
return(0);
}
% mpicc -o x test.c
% mpirun -np 1 ./x
Without this PR the output is
> mydt1 extent 6
> mydt2 extent 8
With this PR both extents are 6.
Fwiw I also tested with mpich and they give 6 for both extents.
Signed-off-by: Mark Allen <[email protected]>
dc4e290 to
bca3c0e
Compare
|
Repushing just to set both OPAL_DATATYPE_FLAG_USER_LB and OPAL_DATATYPE_FLAG_USER_UB since both LB and UB markers are being added to the typemap and currently I'm still thinking they should be propagated via the same rules as all the other typemap entries as defined by each of the type construction calls. |
|
I also looked into OMPI's extent of 20 for the Example 4.9 case in the standard that I was perplexed by above: Whatever we decide about propagation, no way OMPI's 20 is the right extent. What's happening is in opal_datatype_add() at OPAL_DATATYPE_LB_UB_CONT() the code says it first only looks at data entires in the typemap all the way through where epsilon is calculated, and then after that it takes into account the LB/UB markers. But args 3 and 4 here are LB/UB markers from mydt1, eg they're -3 and 6. So we have epsilon being calculated based on LB/UB markers producing garbage. I think those args should be pdtAdd->true_lb and true_ub, as those are the items that come strictly from the data elements in the typemap, and epsilon calculated from those would be legitimate. I'm not necessarily planning to PR that because I think the current PR sidesteps the issue and fixes it independently. But this is how OMPI without this PR is currently coming up with 20 for the extent |
|
The explanation is slightly different from above. The root issue is that OMPI applies the rules for alignment when there was an explicit ub marker in the datatype. As a result, explicitly adding the datatype marker as you did in this PR works. However, the 2 flags (OPAL_DATATYPE_FLAG_USER_LB and OPAL_DATATYPE_FLAG_USER_UB) were designed for the hard UB and LB markers, and I was concerned they might have side effects on the data composition. Luckily for us, it turns out this is not the case now that we deprecated the 2 hard markers and prevented the upper layer from explicitly using the OPAL_DATATYPE_LB and OPAL_DATATYPE_UB. |
In the below type creation sequence
MPI_Type_create_resized(MPI_INT, 0, 6, &mydt1);
MPI_Type_contiguous(1, mydt1, &mydt2);
I think both mydt1 and mydt2 should have extent 6.
The Type_create_resized would add an UB marker into the type map,
and the definition of Type_contiguous would maintain the same
markers in the new map.
The only counter argument I can think of to the above is if
we declared that mydt1 is illegal because it's putting data
on addresses that don't satisfy the alignment requirement.
But in my interpretation of the standard the term "alignment
requirement" is a property of the system memory, and MPI defines
"extent" in a way to make it easy to create MPI datatypes that
support the system's alignment requirements. But the standard
isn't saying it's illegal to make MPI datatypes that don't satisfy
the system's alignment requirements. I think this is true also
because the MPI datatypes might be used in file IO where the
requirements are different, so that's my long winded explanation
for why I don't think we can declare mydt1 illegal.
Complete example:
% mpicc -o x test.c
% mpirun -np 1 ./x
Without this PR the output is
Fwiw I also tested with mpich and they give 6 for both extents.
Signed-off-by: Mark Allen [email protected]