-
Notifications
You must be signed in to change notification settings - Fork 936
btl/ofi: Use common provider include/exclude list #7975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
opal/mca/btl/ofi/btl_ofi_component.c
Outdated
| __FILE__, __LINE__, | ||
| info->fabric_attr->prov_name); | ||
| return OPAL_ERROR; | ||
| } else if (NULL != exclude_list && is_in_list(exclude_list, info->fabric_attr->prov_name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does info->fabric_attr->prov_name get set now? I saw that it's no longer set directly in mca_btl_ofi_component_init . Does it get set somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the OFI BTL doesn't use that attribution in the info arg to fi_getinfo with these changes. I think that's okay, we're now just filtering what fi_getinfo returns for provider info and only picking the one that fits with the specified include/exclude provider lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Thank you for clarifying, Howard!
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
bwbarrett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your topic line of the commit message is too long when even github wraps it. Think more 50 characters, less 75 characters. I think a read of https://chris.beams.io/posts/git-commit/ would be helpful :).
You should also call out that you've brought the BTL up to the MTL standard in terms of debugging output.
opal/mca/btl/ofi/btl_ofi_component.c
Outdated
| static bool disable_sep; | ||
| static int mca_btl_ofi_init_device(struct fi_info *info); | ||
|
|
||
| static int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is exactly the same in the MTL and BTL. This should be moved to common and only exist once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will do that.
| OPAL_DECLSPEC void opal_common_ofi_mca_deregister(void); | ||
| OPAL_DECLSPEC struct fi_info* opal_common_ofi_select_ofi_provider(struct fi_info *providers, | ||
| char *framework_name); | ||
| OPAL_DECLSPEC int opal_common_ofi_is_in_list(char **list, char *item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd love some documentation :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried my best, it's kind of a funky function. Maybe the provider list logic that currently exists should be changed to explicitly call out layered providers (ie. tcp, tcp;ofi_rxm) as separate providers.
The btl/ofi does not currently utilize the common ofi include/exclude list. Added verification code similar to the mtl/ofi that will check if the info object is in the include or exclude list. If it isn't in the include list or is in the exclude list, validate_info will return OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint when calling getinfo, instead filtering the provider during validate_info. This patch also moves the is_in_list MTL function into common code and adds additional debugging output to the BTL to match the MTL standard. Signed-off-by: William Zhang <[email protected]>
|
bot:ompi:retest looks like one of the testers failed in calling in its success. |
|
Can anyone with merge permissions merge this patch? I will create a backport PR for 4.1.x branch after that. |
The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.
This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.
Signed-off-by: William Zhang [email protected]