Skip to content

Conversation

@wckzhang
Copy link
Contributor

@wckzhang wckzhang commented Jul 30, 2020

The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.

This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.

Signed-off-by: William Zhang [email protected]

@wckzhang wckzhang requested review from hkuno and hppritcha July 30, 2020 21:11
__FILE__, __LINE__,
info->fabric_attr->prov_name);
return OPAL_ERROR;
} else if (NULL != exclude_list && is_in_list(exclude_list, info->fabric_attr->prov_name)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does info->fabric_attr->prov_name get set now? I saw that it's no longer set directly in mca_btl_ofi_component_init . Does it get set somewhere else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the OFI BTL doesn't use that attribution in the info arg to fi_getinfo with these changes. I think that's okay, we're now just filtering what fi_getinfo returns for provider info and only picking the one that fits with the specified include/exclude provider lists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thank you for clarifying, Howard!

@wckzhang
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your topic line of the commit message is too long when even github wraps it. Think more 50 characters, less 75 characters. I think a read of https://chris.beams.io/posts/git-commit/ would be helpful :).

You should also call out that you've brought the BTL up to the MTL standard in terms of debugging output.

static bool disable_sep;
static int mca_btl_ofi_init_device(struct fi_info *info);

static int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is exactly the same in the MTL and BTL. This should be moved to common and only exist once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will do that.

@wckzhang wckzhang changed the title btl/ofi: Use common provider include/exclude list to validate info ob… btl/ofi: Use common provider include/exclude list Jul 31, 2020
OPAL_DECLSPEC void opal_common_ofi_mca_deregister(void);
OPAL_DECLSPEC struct fi_info* opal_common_ofi_select_ofi_provider(struct fi_info *providers,
char *framework_name);
OPAL_DECLSPEC int opal_common_ofi_is_in_list(char **list, char *item);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd love some documentation :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried my best, it's kind of a funky function. Maybe the provider list logic that currently exists should be changed to explicitly call out layered providers (ie. tcp, tcp;ofi_rxm) as separate providers.

The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.

This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.

Signed-off-by: William Zhang <[email protected]>
@bwbarrett
Copy link
Member

bot:ompi:retest

looks like one of the testers failed in calling in its success.

@wckzhang
Copy link
Contributor Author

Can anyone with merge permissions merge this patch? I will create a backport PR for 4.1.x branch after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants