Skip to content

Conversation

@jsquyres
Copy link
Member

@jsquyres jsquyres commented Feb 2, 2020

Fix the C types for the following:

  • MPI_UNWEIGHTED
  • MPI_WEIGHTS_EMPTY
  • MPI_ARGV_NULL
  • MPI_ARGVS_NULL
  • MPI_ERRCODES_IGNORE

There is lengthy discussion on
#7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree). We erroneously had several of these array-like sentinel
values be "array-like" values in C. E.g., MPI_ERRCODES_IGNORE was an
(int ) in C while its corresponding Fortran type was "integer,
dimension(1)". On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int
)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years. This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the addresses of these instances,
not their types (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same. But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

% mpifort -fuse-ld=lld /dev/null
ld.lld: error: corrupt input file: version definition index 0 for symbol
mpi_fortran_argv_null_ is out of bounds

defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c007).

Fixes #7209

Signed-off-by: Fangrui Song [email protected]
Signed-off-by: Jeff Squyres [email protected]
(cherry picked from commit 5609268)

Fix the C types for the following:

* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE

There is lengthy discussion on
open-mpi#7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree).  We erroneously had several of these array-like sentinel
values be "array-like" values in C.  E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)".  On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years.  This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same.  But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

  % mpifort -fuse-ld=lld /dev/null
  ld.lld: error: corrupt input file: version definition index 0 for symbol
  mpi_fortran_argv_null_ is out of bounds
  >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
  ...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c007).

Fixes open-mpi#7209

Signed-off-by: Fangrui Song <[email protected]>
Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 5609268)
@jsquyres jsquyres added this to the v4.0.3 milestone Feb 2, 2020
@jsquyres
Copy link
Member Author

jsquyres commented Feb 2, 2020

@gpaulsen @hppritcha Did we miss the boat for v4.0.3?

It would be nice -- but not critical -- to get this in there.

NEWS item:

  • Fixed an issue with a mismatch of Fortran and C symbol sizes that triggered an error in certain versions of GNU ld. Also fixed a compilation issue with the sentinel values MPI_WEIGHTED and MPI_WEIGHTS_EMPTY in the mpi_f08 bindings. Thanks to Fangrui Song for identifying and fixing the issue.

@hppritcha
Copy link
Member

@gpaulsen lets pull this in and do a rc4

@hppritcha hppritcha self-requested a review February 3, 2020 17:15
@hppritcha
Copy link
Member

@jsquyres will this break ABI compatibility with 4.0.x where x <=2?

@jsquyres
Copy link
Member Author

jsquyres commented Feb 3, 2020

@hppritcha Good question. I don't think so, but let me test.

@jsquyres
Copy link
Member Author

jsquyres commented Feb 3, 2020

My tests indicate that ABI should be preserved.

Note that I just updated the NEWS item in a prior comment on this PR -- I forgot that we also fixed MPI_UNWEIGHTED and MPI_WEIGHTS_EMPTY in the mpi_f08 module. Previously, this wouldn't have compiled:

  call MPI_DIST_GRAPH_CREATE(MPI_COMM_WORLD, n, sources, degrees, &
       destinations, MPI_UNWEIGHTED, info, reorder, intercomm)
  call MPI_DIST_GRAPH_CREATE(MPI_COMM_WORLD, n, sources, degrees, &
       destinations, MPI_WEIGHTS_EMPTY, info, reorder, intercomm)

@hppritcha hppritcha merged commit a26cd34 into open-mpi:v4.0.x Feb 3, 2020
@jsquyres jsquyres deleted the pr/v4.0.x/fortran-sentinel-linker-black-magic branch February 6, 2020 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants