-
Notifications
You must be signed in to change notification settings - Fork 936
btl/uct: add support for OpenUCX v1.8 API changes #7016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
btl/uct: add support for OpenUCX v1.8 API changes #7016
Conversation
|
@yosefe Any other unannounced (UCT) API changes in OpenUCX master? Please remember to keep me in the loop. |
| ucs_status_t ucs_status; | ||
| int rc; | ||
|
|
||
| ucs_status = uct_component_query (component, &attr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to initialize field_mask and allocate md_resources..
@hjelmn seems like you didn't try to run it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed to run and found components but I was probably getting lucky :). Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity... Why not allocate the md_resources inside uct_component_query. Calling twice seems a little silly to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity... Why not allocate the md_resources inside uct_component_query. Calling twice seems a little silly to me.
If uct_component_query allocates resources, would need another API to release that memory, in the existing scheme the user is responsible for memory allocation and could use something like alloca() or std::vector which do not require explicit free
|
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/006525567fe2cbca76d9636ccc30d45c |
|
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/6811f1458827d0f76f362a9659c573b7 |
|
:bot:retest |
c01e31f to
f506aa0
Compare
|
@yosefe Installed xpmem in a VM and the changes appear to be working with UCX master. |
|
@yosefe Please re-review. If it's possible it'd be nice to PR back to v4.0.x tonight. |
opal/mca/btl/uct/btl_uct_component.c
Outdated
| /* generate all suitable btl modules */ | ||
| for (unsigned i = 0 ; i < resource_count ; ++i) { | ||
| rc = mca_btl_uct_component_process_uct_md (resources + i, allowed_ifaces); | ||
| rc = mca_btl_uct_component_process_uct_md (resources + i, allowed_ifaces); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wierd indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
opal/mca/btl/uct/btl_uct_tl.c
Outdated
|
|
||
| /* UCT bandwidth is in bytes/sec, BTL is in MB/sec */ | ||
| #if UCT_API > UCT_VERSION(1, 7) | ||
| module->super.btl_bandwidth = (uint32_t) (MCA_BTL_UCT_TL_ATTR(tl, 0).bandwidth.dedicated / 1048576.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to take both dedicated and shared bw into account
dedicated + shared/ppn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
OpenUCX broke the UCT API again in v1.8. This commit updates btl/uct to fix compilation with current OpenUCX master (future v1.8). Further changes will likely be needed for the final release. Signed-off-by: Nathan Hjelm <[email protected]>
f506aa0 to
0ef5634
Compare
|
@yosefe Fixed the issues you identified and added another minor bug-fix. |
This commit fixes a crash that can occur if a transport is usable but doesn't have zero-copy support. In this case do not attempt to use zero-copy and set the max send size off the bcopy limit. Signed-off-by: Nathan Hjelm <[email protected]>
0ef5634 to
8473a66
Compare
|
Should be good to go. |
|
@gpaulsen we need to cherry-pick this over to v4.0.x |
OpenUCX broke the UCT API again in v1.8. This commit updates
btl/uct to fix compilation with current OpenUCX master
(future v1.8). Further changes will likely be needed for
the final release.
Signed-off-by: Nathan Hjelm [email protected]