Skip to content

Conversation

@amirshehataornl
Copy link
Contributor

@amirshehataornl amirshehataornl commented Apr 5, 2023

ofi: NIC selection update

The existing code in compare_cpusets assumed that some non_io ancestor of a
PCI object should intersect with the cpuset of the proc. However, this is
not true. There is a case where the non IO ancestor can be an L3. If there
exists two L3s on the same NUMA and the process is bound to one L3, but
the PCI object is connected to the other L3, then compare_cpusets() will
return false.

A better way to determine the optimal interface is by finding the
distances of the interfaces from the current process. Then find out which
of these interfaces is nearest the process and select it.

Use the PMIx distance generation for this purpose.

Move away from using deprecated PMIX Macros and use the functions directly.
Avoid some compilation issues due to an error in PMIX code

Signed-off-by: Amir Shehata [email protected]

@hppritcha hppritcha changed the title Distances mtl/ofi: NIC selection update - try two Apr 10, 2023
Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these patches are unique to mtl/ofi behaviors; they'll change the btl as well. Please update the prefix to be less specific (like just ofi:).

@amirshehataornl amirshehataornl force-pushed the distances branch 2 times, most recently from 91096eb to 649236a Compare April 14, 2023 21:51
@janjust
Copy link
Contributor

janjust commented Apr 17, 2023

@amirshehataornl can you separate out pmix/prrte changes?

@amirshehataornl amirshehataornl changed the title mtl/ofi: NIC selection update - try two ofi: NIC selection update - try two Apr 18, 2023
@amirshehataornl amirshehataornl changed the title ofi: NIC selection update - try two ofi: NIC selection update Apr 18, 2023
The existing code in compare_cpusets assumed that some non_io ancestor of a
PCI object should intersect with the cpuset of the proc. However, this is
not true. There is a case where the non IO ancestor can be an L3. If there
exists two L3s on the same NUMA and the process is bound to one L3, but
the PCI object is connected to the other L3, then compare_cpusets() will
return false.

A better way to determine the optimal interface is by finding the
distances of the interfaces from the current process. Then find out which
of these interfaces is nearest the process and select it.

Use the PMIx distance generation for this purpose.

Move away from using deprecated PMIX macros and use the functions directly
instead.

Signed-off-by: Amir Shehata <[email protected]>
@awlauria
Copy link
Contributor

Is this ready to go?

@naughtont3
Copy link
Contributor

yes, this is ready

@naughtont3 naughtont3 dismissed bwbarrett’s stale review April 28, 2023 20:02

The title and commit msg were update to address this review request.

@naughtont3
Copy link
Contributor

@bwbarrett Can you rereview/approve. The changes were made to reflect your feedback. Thanks.

@naughtont3
Copy link
Contributor

@awlauria this is ready to merge

@awlauria awlauria merged commit 42e577f into open-mpi:main May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants