Skip to content

Conversation

@bwbarrett
Copy link
Member

@bwbarrett bwbarrett commented Sep 29, 2021

The primary goal of this patch series is to address the issues found as part of #8822. At the end of this patch series, the import monitor is only installed in Libfabric if the OFI MTL or BTL is about to open a domain. This significantly decreases the number of cases where the ofi common code will initialize the Open MPI patcher interface unnecessarily. The configure code was updated to check for features, rather than version numbers. At runtime, unfortunately we need to avoid installing the import monitor in Libfabric 1.13 due to bugs in Libfabric 1.13, so a runtime version check is still used. However, Open MPI can be built against Libfabric 1.13 and run against 1.14 and the memory monitor will be properly installed.

The patch series also includes significant code cleanups to the common ofi code to make sure the interface is properly documented, to clean up some redundant initialization routines, and to make it more clear when various stages of the common ofi code should be called.

This patch series replaces PR #9387.

* opal_common_ofi_init().
*
* @returns OPAL_SUCCESS on success, OPAL error code on failure
* @note This function is not thread safe and must be called in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is component open/close of two different framework components always called serially? I threw in a lock in my patch because I wasn't sure about this and made ofi_open and ofi_close thread safe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is component open/close of two different framework components always called serially? I threw in a lock in my patch because I wasn't sure about this and made ofi_open and ofi_close thread safe

Yes, always serial. There may be other threads, but they will not be opening components.

Remove cruft from common_ofi, including unnecessary includes,
declarations, and OPAL_DECLSPEC annotations on function
definitions (they are only needed on declarations).

Also move one function back into the C_DECLS block and
properly OPAL_DECLSPEC the function.

Signed-off-by: Brian Barrett <[email protected]>
Test for the features needed to support injecting a memory
monitor in Libfabric, rather than testing for a Libfabric
version.

Signed-off-by: Brian Barrett <[email protected]>
The documentation in the header file for the ofi common code had
been neglected, so show some love.  Also add some helpful comments
in understanding the memory monitor injection code.

Signed-off-by: Brian Barrett <[email protected]>
Rename initialization calls to match the MCA terminology
and restructure code to match the calling pattern of
normal components (so during component_register, OFI
components should call common_ofi_mca_register).

Signed-off-by: Brian Barrett <[email protected]>
Delay initializing the import memory monitor from component
open to a function which should be called immediately before
the Libfabric domain is initialized.  Registering the import
memory monitor requires initializing the patcher memory hooks,
which is not necessary if the OFI fabrics were not selected.

Signed-off-by: Brian Barrett <[email protected]>
@bwbarrett bwbarrett force-pushed the feature/common-ofi-delay-initializing-patcher branch from 227b8c1 to 2349f6b Compare September 30, 2021 20:14
@bwbarrett
Copy link
Member Author

@wckzhang I addressed all of your comments. Thanks!

@bwbarrett bwbarrett merged commit 374e5c9 into open-mpi:master Sep 30, 2021
@bwbarrett bwbarrett deleted the feature/common-ofi-delay-initializing-patcher branch September 30, 2021 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants