Skip to content

Conversation

@jsquyres
Copy link
Member

@jsquyres jsquyres commented Jul 10, 2020

NOTE: This is intentionally not a cherry pick from master. Instead,
this is a cherry-pick from the equivalent commit on the v4.0.x branch.
See below.

There is a problem with the mpi_f08 module when sizeof(int) !=
sizeof(INTEGER): the size of TYPE(MPI_Status) is too small. This
causes buffer overruns when Open MPI is configured with (for example)
sizeof(int)==4 and sizeof(INTEGER)==8, and then you call the mpi_f08
MPI_RECV subroutine. This will end up copying the resulting C
MPI_Status to the buffer pointing to the Fortran status, but the code
does not know if the Fortran status is an mpif.h status or a
TYPE(MPI_Status) -- it just blindly copies over as if the Fortran
status is an INTEGER array of length MPI_STATUS_SIZE. Unfortunately,
TYPE(MPI_Status) is actually smaller than this, so we overrun the
buffer. Hilarity ensues.

The simple fix for this is to make TYPE(MPI_Status) the same size as
INTEGER(MPI_STATUS_SIZE), but we can't do that here on the release
branch because it will break ABI.

This commit does the following:

  • checks to see if we're in a sizeof(int) != sizeof(INTEGER) scenario
  • if so, if the user has not specifically excluded building the
    mpi_f08 module, display a Giant Error Message (GEM) and abort
    configure.

This is unusual; we don't usually abort configure when feature XYZ
can't be built -- if the user didn't specifically ask for XYZ, we
just emit a notice that we won't build XYZ and continue.

This situation is a little different because we're on a release
branch: prior releases have built mpi_f08 by default -- even in this
"bad" scenario. Hence, in this case, we explicitly tell the user that
this is now a known-bad scenario and abort. In the GEM, we give the
user two options:

  1. Change their compiler flags so that sizeof(int) == sizeof(INTEGER)
    and re-run configure, or
  2. Explicitly disable the mpi_f08 module via --enable-mpi-fortran=usempi

Thanks to @ahaichen for reporting the issue.

Note: the proper fix has been implemented on master (i.e., what will
become v5.0.0), but since that breaks ABI, we can't cherry pick it
back here to an existing release branch series. Hence, we
cherry-picked this fix from the v4.0.x branch.

Signed-off-by: Jeff Squyres [email protected]
(cherry picked from commit 27836a6)

Refs #7918

@jsquyres jsquyres added this to the v4.1.0 milestone Jul 10, 2020
@jsquyres jsquyres requested a review from ggouaillardet July 10, 2020 14:15
@jsquyres jsquyres marked this pull request as draft July 10, 2020 17:05
@jsquyres
Copy link
Member Author

Do not merge until the discussion on the corresponding v4.0.x PR #7922 (from where this PR was cherry picked) is complete.

@jsquyres jsquyres force-pushed the pr/v4.1.x/disallow-when-cint-not-equal-to-finteger branch from f7989f4 to 4d3b538 Compare July 10, 2020 21:39
NOTE: This is intentionally not a cherry pick from master.  Instead,
this is a cherry-pick from the equivalent commit on the v4.0.x branch.
See below.

There is a problem with the mpi_f08 module when sizeof(int) !=
sizeof(INTEGER): the size of TYPE(MPI_Status) is too small.  This
causes buffer overruns when Open MPI is configured with (for example)
sizeof(int)==4 and sizeof(INTEGER)==8, and then you call the mpi_f08
MPI_RECV subroutine.  This will end up copying the resulting C
MPI_Status to the buffer pointing to the Fortran status, but the code
does not know if the Fortran status is an mpif.h status or a
TYPE(MPI_Status) -- it just blindly copies over as if the Fortran
status is an INTEGER array of length MPI_STATUS_SIZE.  Unfortunately,
TYPE(MPI_Status) is actually smaller than this, so we overrun the
buffer.  Hilarity ensues.

The simple fix for this is to make TYPE(MPI_Status) the same size as
INTEGER(MPI_STATUS_SIZE), but we can't do that here on the release
branch because it will break ABI.

This commit does the following:

- checks to see if we're in a sizeof(int) != sizeof(INTEGER) scenario
- if so, if the user has not specifically excluded building the
  mpi_f08 module, display a Giant Error Message (GEM) and abort
  configure.

This is unusual; we don't usually abort configure when feature XYZ
can't be built -- if the user didn't specifically ask for XYZ, we
just emit a notice that we won't build XYZ and continue.

This situation is a little different because we're on a release
branch: prior releases have built mpi_f08 by default -- even in this
"bad" scenario.  Hence, in this case, we explicitly tell the user that
this is now a known-bad scenario and abort.  In the GEM, we give the
user two options:

1. Change their compiler flags so that sizeof(int) == sizeof(INTEGER)
   and re-run configure, or
2. Explicitly disable the mpi_f08 module via --enable-mpi-fortran=usempi

Thanks to @ahaichen for reporting the issue.

Note: the proper fix has been implemented on master (i.e., what will
become v5.0.0), but since that breaks ABI, we can't cherry pick it
back here to an existing release branch series. Hence, we
cherry-picked this fix from the v4.0.x branch.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 27836a6)
@jsquyres jsquyres force-pushed the pr/v4.1.x/disallow-when-cint-not-equal-to-finteger branch from 4d3b538 to 9492293 Compare July 10, 2020 21:40
@jsquyres jsquyres added the NEWS label Jul 10, 2020
@jsquyres jsquyres requested a review from bwbarrett July 10, 2020 21:40
@jsquyres jsquyres marked this pull request as ready for review July 10, 2020 21:40
@jsquyres
Copy link
Member Author

Strange AWS CI error.

bot:aws:retest

@bwbarrett bwbarrett merged commit 4fd3cdc into open-mpi:v4.1.x Jul 13, 2020
@jsquyres jsquyres deleted the pr/v4.1.x/disallow-when-cint-not-equal-to-finteger branch July 13, 2020 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants