-
Notifications
You must be signed in to change notification settings - Fork 934
v4.0.x: Make C and Fortran types for MPI sentinels agree in size #7355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v4.0.x: Make C and Fortran types for MPI sentinels agree in size #7355
Conversation
Fix the C types for the following: * MPI_UNWEIGHTED * MPI_WEIGHTS_EMPTY * MPI_ARGV_NULL * MPI_ARGVS_NULL * MPI_ERRCODES_IGNORE There is lengthy discussion on open-mpi#7210 describing the issue; the gist of it is that the C and Fortran types for several MPI global sentenial values should agree (specifically: their sizes must(**) agree). We erroneously had several of these array-like sentinel values be "array-like" values in C. E.g., MPI_ERRCODES_IGNORE was an (int *) in C while its corresponding Fortran type was "integer, dimension(1)". On a 64 bit platform, this resulted in C expecting the symbol size to be sizeof(int*)==8 while Fortran expected the symbol size to be sizeof(INTEGER, DIMENSION(1))==4. That is incorrect -- the corresponding C type needed to be (int). Then both C and Fortran expect the size of the symbol to be the same. (**) NOTE: This code has been wrong for years. This mismatch of types typically worked because, due to Fortran's call-by-reference semantics, Open MPI was comparing the *addresses* of these instances, not their *types* (or sizes) -- so even if C expected the size of the symbol to be X and Fortran expected the size of the symbol to be Y (where X!=Y), all we really checked at run time was that the addresses of the symbols were the same. But it caused linker warning messages, and even caused errors in some cases. Specifically: due to a GNU ld bug (https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common symbols are incorrectly versioned VER_NDX_LOCAL because their definitions in Fortran sources have smaller st_size than those in libmpi.so. This makes the Fortran library not linkable with lld in distributions that ship openmpi built with -Wl,--version-script (https://bugs.llvm.org/show_bug.cgi?id=43748): % mpifort -fuse-ld=lld /dev/null ld.lld: error: corrupt input file: version definition index 0 for symbol mpi_fortran_argv_null_ is out of bounds >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so ... If we fix the C and Fortran symbols to actually be the same size, the problem goes away and the GNU ld bug does not come into play. This commit also fixes a minor issue that MPI_UNWEIGHTED and MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed by commit 107c007). Fixes open-mpi#7209 Signed-off-by: Fangrui Song <[email protected]> Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit 5609268)
|
@gpaulsen @hppritcha Did we miss the boat for v4.0.3? It would be nice -- but not critical -- to get this in there. NEWS item:
|
|
@gpaulsen lets pull this in and do a rc4 |
|
@jsquyres will this break ABI compatibility with 4.0.x where x <=2? |
|
@hppritcha Good question. I don't think so, but let me test. |
|
My tests indicate that ABI should be preserved. Note that I just updated the NEWS item in a prior comment on this PR -- I forgot that we also fixed call MPI_DIST_GRAPH_CREATE(MPI_COMM_WORLD, n, sources, degrees, &
destinations, MPI_UNWEIGHTED, info, reorder, intercomm)
call MPI_DIST_GRAPH_CREATE(MPI_COMM_WORLD, n, sources, degrees, &
destinations, MPI_WEIGHTS_EMPTY, info, reorder, intercomm) |
Fix the C types for the following:
There is lengthy discussion on
#7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree). We erroneously had several of these array-like sentinel
values be "array-like" values in C. E.g., MPI_ERRCODES_IGNORE was an
(int ) in C while its corresponding Fortran type was "integer,
dimension(1)". On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.
That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.
(**) NOTE: This code has been wrong for years. This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the addresses of these instances,
not their types (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same. But it caused linker warning messages,
and even caused errors in some cases.
Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.
This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):
% mpifort -fuse-ld=lld /dev/null
ld.lld: error: corrupt input file: version definition index 0 for symbol
mpi_fortran_argv_null_ is out of bounds
If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.
This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c007).
Fixes #7209
Signed-off-by: Fangrui Song [email protected]
Signed-off-by: Jeff Squyres [email protected]
(cherry picked from commit 5609268)