-
Notifications
You must be signed in to change notification settings - Fork 936
Clean up OFI common code and delay patcher initialization until needed #9441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up OFI common code and delay patcher initialization until needed #9441
Conversation
| * opal_common_ofi_init(). | ||
| * | ||
| * @returns OPAL_SUCCESS on success, OPAL error code on failure | ||
| * @note This function is not thread safe and must be called in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is component open/close of two different framework components always called serially? I threw in a lock in my patch because I wasn't sure about this and made ofi_open and ofi_close thread safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is component open/close of two different framework components always called serially? I threw in a lock in my patch because I wasn't sure about this and made ofi_open and ofi_close thread safe
Yes, always serial. There may be other threads, but they will not be opening components.
Remove cruft from common_ofi, including unnecessary includes, declarations, and OPAL_DECLSPEC annotations on function definitions (they are only needed on declarations). Also move one function back into the C_DECLS block and properly OPAL_DECLSPEC the function. Signed-off-by: Brian Barrett <[email protected]>
Test for the features needed to support injecting a memory monitor in Libfabric, rather than testing for a Libfabric version. Signed-off-by: Brian Barrett <[email protected]>
The documentation in the header file for the ofi common code had been neglected, so show some love. Also add some helpful comments in understanding the memory monitor injection code. Signed-off-by: Brian Barrett <[email protected]>
Rename initialization calls to match the MCA terminology and restructure code to match the calling pattern of normal components (so during component_register, OFI components should call common_ofi_mca_register). Signed-off-by: Brian Barrett <[email protected]>
Delay initializing the import memory monitor from component open to a function which should be called immediately before the Libfabric domain is initialized. Registering the import memory monitor requires initializing the patcher memory hooks, which is not necessary if the OFI fabrics were not selected. Signed-off-by: Brian Barrett <[email protected]>
227b8c1 to
2349f6b
Compare
|
@wckzhang I addressed all of your comments. Thanks! |
The primary goal of this patch series is to address the issues found as part of #8822. At the end of this patch series, the import monitor is only installed in Libfabric if the OFI MTL or BTL is about to open a domain. This significantly decreases the number of cases where the ofi common code will initialize the Open MPI patcher interface unnecessarily. The configure code was updated to check for features, rather than version numbers. At runtime, unfortunately we need to avoid installing the import monitor in Libfabric 1.13 due to bugs in Libfabric 1.13, so a runtime version check is still used. However, Open MPI can be built against Libfabric 1.13 and run against 1.14 and the memory monitor will be properly installed.
The patch series also includes significant code cleanups to the common ofi code to make sure the interface is properly documented, to clean up some redundant initialization routines, and to make it more clear when various stages of the common ofi code should be called.
This patch series replaces PR #9387.