Skip to content

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Jun 2, 2021

Port of #8998. This cannot be a direct cherry-pick
as it requires update of the PMIx and PRRTE release
branch pointers instead of their master branch
equivalents.

bot:notacherrypick

Signed-off-by: Ralph Castain [email protected]

Port of #8998. This cannot be a direct cherry-pick
as it requires update of the PMIx and PRRTE release
branch pointers instead of their master branch
equivalents.

bot:notacherrypick

Signed-off-by: Ralph Castain <[email protected]>
@rhc54 rhc54 added the bug label Jun 2, 2021
@rhc54 rhc54 added this to the v5.0.0 milestone Jun 2, 2021
@rhc54 rhc54 requested a review from jsquyres June 2, 2021 14:41
@rhc54 rhc54 self-assigned this Jun 2, 2021
@jsquyres
Copy link
Member

jsquyres commented Jun 2, 2021

@rhc54 Do we need the OMPI code as well? All I see is the PMIx / PRRTE submodule updates here.

Also, it looks like a legit compile fail in CI:

  CC       prted/libprrte_la-prte.lo
In file included from ../../../../../3rd-party/prrte/src/prted/prte.c:102:
/home/ubuntu/workspace/open-mpi.build.distcheck/src/openmpi-gitclone/3rd-party/prrte/include/prte.h:19:10: fatal error: prte_version.h: No such file or directory
   19 | #include "prte_version.h"
      |          ^~~~~~~~~~~~~~~~

@rhc54
Copy link
Contributor Author

rhc54 commented Jun 2, 2021

@rhc54 Do we need the OMPI code as well? All I see is the PMIx / PRRTE submodule updates here.

Jeff, Jeff, Jeff....I gather you failed to read the note on the quoted PR where I expressly directed that you need to add the OMPI bits?

Also, it looks like a legit compile fail in CI:

  CC       prted/libprrte_la-prte.lo
In file included from ../../../../../3rd-party/prrte/src/prted/prte.c:102:
/home/ubuntu/workspace/open-mpi.build.distcheck/src/openmpi-gitclone/3rd-party/prrte/include/prte.h:19:10: fatal error: prte_version.h: No such file or directory
   19 | #include "prte_version.h"
      |          ^~~~~~~~~~~~~~~~

Yeah, it's been reported on PRRTE as well. It's a VPATH issue. Sigh - someday they will outlaw that thing!

rhc54 added 2 commits June 2, 2021 09:12
Update OMPI to check for PMIx attribute and set
`ompi_mpi_oversubscribe` accordingly.  Move logic for setting
yield_when_idle to a place after the oversubscribe flag has been
checked.

- change logic of setting ompi_mpi_yield_when_idle
- nit: change `ompi_mpi_oversubscribe` to `ompi_mpi_oversubscribed`
- add comment in ompi/runtime/params.h

This is a cherry-pick of the Open MPI parts of 2b335ed.  The Open
PMIx / PRRTE git submodule updates are in a different commit on this
PR (because they're different than the git submodule updates on
master).

Signed-off-by: Ralph Castain <[email protected]>
Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit 2b335ed)
@rhc54
Copy link
Contributor Author

rhc54 commented Jun 2, 2021

@jsquyres Should be ready to go now.

@ibm-ompi
Copy link

ibm-ompi commented Jun 2, 2021

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/773712119c3f0750dfbdbcede978e795

@rhc54
Copy link
Contributor Author

rhc54 commented Jun 2, 2021

I'm sure there must be an error somewhere in the XL compile - but how is anyone supposed to find it in the midst of that ridiculous tsunami of warnings:

In file included from ../../../opal/class/opal_free_list.h:28:
 ../../../opal/class/opal_lifo.h:232:9: warning: extension used [-Wlanguage-extension-token]
            opal_atomic_ll_ptr(&amp;lifo-&gt;opal_lifo_head.data.item, item);
#        define opal_atomic_ll_ptr(addr, ret) opal_atomic_ll_64((opal_atomic_int64_t *) (addr), ret)
../../../opal/include/opal/sys/powerpc/atomic.h:253:24: note: expanded from macro 'opal_atomic_ll_64'
                        ret = (typeof(ret)) _ret;                                                   \

@ibm-ompi
Copy link

ibm-ompi commented Jun 2, 2021

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/e4f760bdc20b51ccd2fb14a3b2c4d9c1

@gpaulsen
Copy link
Member

gpaulsen commented Jun 3, 2021

@awlauria will take a look at IBM CI failures today. Thanks!

@jjhursey
Copy link
Member

jjhursey commented Jun 3, 2021

IBM CI machine was overloaded causing timeouts. The issue should be resolved now.
bot:ibm:xl:retest
bot:ibm:pgi:retest

@awlauria
Copy link
Contributor

awlauria commented Jun 3, 2021

Thanks!

@awlauria awlauria merged commit ee3e7f5 into open-mpi:v5.0.x Jun 3, 2021
@rhc54 rhc54 deleted the cmr50/osub branch October 20, 2021 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants