Skip to content

Conversation

@rhc54
Copy link
Contributor

@rhc54 rhc54 commented Jan 20, 2021

As currently written in this release branch, the PML selection "check" logic doesn't guarantee that the caller's PML choice will be checked against only that from MPI_COMM_WORLD rank=0 when a full modex has been performed. This can lead to every process calling "dmodex" to obtain the PML selection of every other process in the job, causing major delay in wireup on first call to communicate.

These cherry-picks contain the updates developed/committed to master after the code in this release branch was brought over to it. One additional cherry-pick was required to cleanly port the code.

bosilca and others added 2 commits January 20, 2021 06:28
With this patch the best PML is selected earlier, before finalizing
the others PML. This provides a simpler mechanism to intercept and
highjack the PML (as done in the monitoring PML)

Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit 668aa15)
For direct modex, all procs publish the selected pml module
and then at add_procs pml module for each proc is checked
against every other proc in the add_proc call.
For full modex, there is no change in functionality. Only Rank0
publishes its selected pml, all other procs in the add_proc call
check their selected pml against Rank0.
If pml's do not match, throw error and exit.

Signed-off-by: Dipti Kothari <[email protected]>
(cherry picked from commit 5418cc5)
@rhc54 rhc54 added this to the v4.1.1 milestone Jan 20, 2021
@rhc54 rhc54 requested review from bosilca and rajachan January 20, 2021 14:49
@rhc54 rhc54 self-assigned this Jan 20, 2021
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed exactly what we have in master.

Signed-off-by: Ralph Castain <[email protected]>
@open-mpi open-mpi deleted a comment from ibm-ompi Jan 20, 2021
@open-mpi open-mpi deleted a comment from ibm-ompi Jan 20, 2021
@open-mpi open-mpi deleted a comment from ibm-ompi Jan 20, 2021
@rhc54
Copy link
Contributor Author

rhc54 commented Jan 21, 2021

I have moved this to "draft" status because I believe we need to revisit the PML selection check scheme. Please see #8404 (comment) for an explanation

@rhc54 rhc54 added the bug label Jan 21, 2021
@rhc54 rhc54 marked this pull request as ready for review January 21, 2021 16:32
@rhc54
Copy link
Contributor Author

rhc54 commented Jan 21, 2021

After conversation, this is good to go!

@jsquyres jsquyres merged commit 50974d5 into open-mpi:v4.1.x Jan 21, 2021
@rhc54 rhc54 deleted the cmr41/pml branch March 18, 2021 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants