Skip to content

Conversation

@jsquyres
Copy link
Member

@jsquyres jsquyres commented Jun 16, 2020

OSC rdma had a reference to openib, which no longer exists on master. It also had a typo for the tcp BTL (but even after fixing that typo, OSC rdma does not activate itself when the TCP BTL is used).

@jsquyres
Copy link
Member Author

@bwbarrett @awlauria @rhc54 This is the cause of so many of the Cisco MTT failures over the past month or two.

@jsquyres jsquyres requested a review from bwbarrett June 16, 2020 02:18
@bwbarrett
Copy link
Member

Either someone is going to put in effort to fix the PT2PT component or fix the RDMA component. Why would we put effort into the PT2PT component?

@rhc54
Copy link
Contributor

rhc54 commented Jun 16, 2020

I believe nobody disputes that statement - the issue is: who is going to put in the effort? Simply removing p2p isn't the answer. To date, nobody has been willing to make the effort. Perhaps providing a clean mechanism by which BTLs can reject OSC operations with an appropriate error would solve it - but somebody would have to make the effort to create that too. 🤷‍♂️

@hjelmn
Copy link
Member

hjelmn commented Jun 16, 2020

I will take a look at btl/tcp.

@jsquyres
Copy link
Member Author

jsquyres commented Jun 16, 2020

Even if BTL tcp is fixed, do users running on a single node (e.g., a laptop) have to run mpirun --mca btl sm,self,tcp to have MPI_WIN_CREATE work properly?

Same question for usNIC: if I'm using usNIC (which can't be used for loopback communication), do I have to run mpirun --mca btl usnic,sm,self,tcp to get MPI one-sided support?

"openib" no longer exists.

"tcp" had a typo.

Signed-off-by: Jeff Squyres <[email protected]>
@jsquyres jsquyres force-pushed the pr/put-osc-pt2pt-back branch from fd65517 to 18cfcc8 Compare June 16, 2020 16:11
@jsquyres jsquyres changed the title Put osc pt2pt back Fix typos in OSC RDMA BTL allowlist Jun 16, 2020
@jsquyres
Copy link
Member Author

Per discussion on the weekly OMPI call today, I changed this PR to solely remove openib and fix the typo in the OSC pt2pt allowlist.

OBJ_RELEASE(new_enum);

ompi_osc_rdma_btl_names = "openib,ugni,uct,ucp";
ompi_osc_rdma_btl_names = "ugni,uct,tcp";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should ofi be in this list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I defer to @hjelmn to answer that...

Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to approve earlier...

@awlauria
Copy link
Contributor

bot:ompi:retest

1 similar comment
@awlauria
Copy link
Contributor

bot:ompi:retest

@awlauria
Copy link
Contributor

One more time..
bot:ompi:retest

@awlauria
Copy link
Contributor

bot:ompi:retest

@awlauria awlauria merged commit 9b86f14 into open-mpi:master Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants