Skip to content

Conversation

@jjhursey
Copy link
Member

No description provided.

@jjhursey jjhursey added this to the v4.1.0 milestone Jul 13, 2020
@jjhursey jjhursey requested review from gpaulsen and rhc54 July 13, 2020 20:33
@jjhursey jjhursey added the bug label Jul 13, 2020
@jjhursey
Copy link
Member Author

The schizo/jsm component only exists on the v4.1.x branch.

After reviewing PR #7933 it looks like we should do the same for the JSM component.

@jsquyres
Copy link
Member

Per this:

The schizo/jsm component only exists on the v4.1.x branch.

@gpaulsen Why did you approve?

@gpaulsen
Copy link
Member

My understanding is that this IS the work on the jsm schizo. I think he just meant that we need the same change that slurm needs, and this is it.

Also this is only needed on v4.1.x. v4.0.x never accepted the schizo:jsm component (new feature after v4.0.0), and it's not needed on master as PMIx handles these things (or so I'm told).

@jjhursey can you please confirm?

@rhc54
Copy link
Contributor

rhc54 commented Jul 13, 2020

FWIW: neither this nor the Slurm PR are actually correct. There is more going on here than we originally realized. I'll try to provide a broader PR that addresses all the systems.

@jjhursey
Copy link
Member Author

A few notes:

  • schizo/jsm is only on the v4.1.x branch. It was not accepted to v4.0.x (it was late and a new feature). master does not need these kind of shims anymore.
  • The only thing that we need the schizo component to do is detect that it was direct launched with jsrun and acknowledge that OMPI is talking via PMIx to the JSM resource manager.
  • We did not explicitly do anything with binding originally, but looking at the slurm version it seemed reasonable that we would need the same patch to prevent OMPI from doing any binding. But if @rhc54 thinks that that patch isn't sufficient then we should wait until we have the proper fix and merge that.

Copy link
Contributor

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably sufficient, though not complete - but (after playing with it a bit) I don't have the time for a complete fix.

FWIW: setting the hwloc_base_binding_policy envar does nothing when direct launching.

@jsquyres jsquyres merged commit 57044aa into open-mpi:v4.1.x Jul 14, 2020
@jjhursey jjhursey deleted the v4-jsm-no-bind branch July 14, 2020 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants