docs: update "Getting help" #11092

jsquyres · 2022-11-19T14:15:42Z

Revamp the "Getting help" docs page to include a bit more information and have more recent procedures for the Open MPI community.

Signed-off-by: Jeff Squyres [email protected]

Working with a user recently, I realized that the current "Getting Help" page wasn't quite sufficient for the things we ask for today. This PR revamps the page a bit, requests a few more pieces of information, etc.

For this PR, it might be easier to compare the existing Getting Help page vs. the new Getting Help page than to read the diff.

jsquyres · 2022-11-21T13:38:32Z

/azp run

azure-pipelines · 2022-11-21T13:38:42Z

Azure Pipelines successfully started running 1 pipeline(s).

jjhursey

I think this looks fine as is. One suggestion that you can take or leave.

docs/getting-help.rst

rhc54 · 2022-11-21T16:30:36Z

I haven't read the entire entry, but a few suggestions for trying to debug launch problems in case you missed them:

configure with --enable-debug to ensure all the diagnostic output is available
provide the output from running mpirun --map-by ppr:1:node --prtemca plm_base_verbose 100 --prtemca rmaps_base_verbose 100 --display alloc hostname
if you are having problems with binding patterns, provide the HWLOC topology in XML format

Probably a few others you can think of, but those seem to be the ones I at least am repeatedly requesting.

jsquyres · 2022-11-21T17:35:51Z

@rhc54 Is there a way we could make it so that the user does not have to configure with --enable-debug to get all the relevant user-debugging messages? (I'm guessing that these messages only show up when you enable the corresponding verbose MCA param, right?)

rhc54 · 2022-11-21T18:11:33Z

Sure - but the negative is that then all of those verbose output calls get executed even when we are not debugging. It used to be a significant hit (since there are a lot of them) as we rendered the output and then discarded it without printing. I seem to recall we changed that not too long ago so that we don't render, which helps reduce the impact - it's nothing more than a quick "if" clause, so perhaps the impact is small now?

jsquyres · 2022-11-21T18:39:26Z

Is it possible to query the verbosity level and only do the rendering if the verbosity level is > desired_level?

rhc54 · 2022-11-21T18:45:28Z

😄 I was asking you if we had already done that (since we are just following what you guys have in opal_output), and now you ask me if it is possible

Looking at the PMIx code (since we use it in PRRTE), it appears that we did indeed change it to check the verbosity level first, before doing the rendering. So the impact is pretty minimal.

I'll just change PMIx so it always compiles in the verbose output code as that is the easiest solution. Will require a new release, though, so I don't know how you want to handle that over here.

jsquyres · 2022-11-21T18:52:04Z

😄 I was asking you if we had already done that (since we are just following what you guys have in opal_output), and now you ask me if it is possible

Haha! FWIW, I was assuming you were doing things like:

if (something_happens) {
    asprintf(&foo, "...", ...);
    OPAL_OUTPUT_VERBOSE(..., foo, ...);
    free(foo);
}

I.e., the foo stuff is what you were referring to as the additional rendering, which therefore wasn't already covered by a check for the verbosity level. Sounds like we are violently agreeing. 😄

I'll just change PMIx so it always compiles in the verbose output code as that is the easiest solution. Will require a new release, though, so I don't know how you want to handle that over here.

Awesome; thanks. We're still advancing PRTE and PMIx git submodules on main/v5.0.x (I just filed #11096 this morning to pick up the fixes you+me talked about over the weekend).

@open-mpi/ompi-rm-5-0-x Do you want to weigh in here?

rhc54 · 2022-11-21T18:58:05Z

I.e., the foo stuff is what you were referring to as the additional rendering, which therefore wasn't already covered by a check for the verbosity level. Sounds like we are violently agreeing. 😄

Yeah, that's the way you used to have it in opal_output - your capitalized version would call opal_output_generate_string and then check the verbosity to see if it should be printed. Now the macro first checks the verbosity and then generates the string.

If I were more concerned about performance, then having that "if" clause always being executed would be a concern. Nice to have it completely compiled away for performant configurations. However, I can see where it might be useful to always have the debug available. Tough tradeoff.

jsquyres · 2022-11-21T18:59:33Z

If you're concerned about the "if", you could OPAL_LIKELY or OPAL_UNLIKELY it...?

rhc54 · 2022-11-21T19:03:17Z

Problem is that "check_verbosity" is a function call:

PMIX_EXPORT bool pmix_output_check_verbosity(int level, int output_id)
{
    return (output_id >= 0 && output_id < PMIX_OUTPUT_MAX_STREAMS
            && info[output_id].ldi_verbose_level >= level);
}

Not sure how you can wrap all that into an LIKELY macro. Might require completely revamping how verbosity is handled - ick.

jsquyres · 2022-11-21T19:09:05Z

FWIW, that's such a small function that it would be pretty easy to change to either inline or -- since struct opal_output_stream_t is declared and defined in output.h -- a macro.

That being said, I don't know if it's worth it. Without any measurement proof from me, I'm guessing that changing this function to a macro won't have any noticeable impact on overall performance.

rhc54 · 2022-11-21T19:12:22Z

Thought about that, but I believe that would require making the info array a global - yes? If you inline the check_verbosity function (or make it into a macro), then the info array has to be visible wherever that function gets invoked. Or am I missing something?

jsquyres · 2022-11-21T19:14:55Z

Yes, it would have to be global. IMHO: if you prefix the symbol appropriately, there's little difference between static-global-in-a-file vs. global-everywhere.

That is a bunch of work to save a single function call (unless you are calling this function a bazillion times, in which case we might want to have a different optimization discussion 😉).

rhc54 · 2022-11-21T19:34:06Z

That is a bunch of work to save a single function call (unless you are calling this function a bazillion times

Well, it gets called every single time there is an output_verbose in the code path, which tends to be a lot as tracking down problems in things like the mapping code requires a lot of info and the calls typically reside inside extended loops - and sadly, we usually need that info from the user as it is so dependent on their environment. It's why we were using the cap-macro version so much.

I've got it pretty well done now - not as much work as you might think. Annoying, but probably useful in the long run.

jsquyres · 2022-11-21T19:37:10Z

I've got it pretty well done now - not as much work as you might think. Annoying, but probably useful in the long run.

Roger!

rhc54 · 2022-11-21T19:58:28Z

openpmix/openpmix#2815 does what you want. Since PRRTE uses the PMIx "output" code, it should immediately translate into PRRTE behavior. I'll likely update the PRRTE code base to directly call the lower-case functions, but continue to retain the updated PMIX_OUTPUT_VERBOSE macro for backward build compatibility (it'll still always be enabled, not just for debug configs).

jsquyres · 2022-11-21T20:02:10Z

Revamped since Josh initially reviewed it

jsquyres · 2022-11-21T20:15:30Z

@jjhursey @rhc54 I think I incorporated all the suggested changes -- I actually split it into 3 sections now (building+installing, launching, and running). Could you have a look at the new page?

rhc54 · 2022-11-21T20:23:53Z

You know, instead of having them run our mpirun --map-by ppr:1:node ... command, perhaps we should just have them add the MCA params to their command line? I'm thinking that it usually is some problem with their map/rank/bind combination, so having the output from their specific use-case might be the better choice.

jsquyres · 2022-11-21T20:26:49Z

You know, instead of having them run our mpirun --map-by ppr:1:node ... command, perhaps we should just have them add the MCA params to their command line? I'm thinking that it usually is some problem with their map/rank/bind combination, so having the output from their specific use-case might be the better choice.

I thought about that, but what if their app requires ppn>1 ?

rhc54 · 2022-11-21T20:31:10Z

Yeah, it's a bit of a quandary. Really, we have three parts here:

verify that the basic daemon launch is working. The cmd currently listed helps us there
verify that they are able to correctly map their job. This requires that they use their cmd line
verify that PRRTE accurately and successfully launches the mapped job. This can probably just be combined with the above

Maybe you need separate sections for these? Hate to create more work, but perhaps that would be clearer?

Revamp the "Getting help" docs page to include a bit more information and have more recent procedures for the Open MPI community. Signed-off-by: Jeff Squyres <[email protected]>

jsquyres · 2022-11-22T22:15:01Z

Maybe you need separate sections for these? Hate to create more work, but perhaps that would be clearer?

I think I'm ok leaving it as-is, at least for the moment. I did another push with a trivial change compared to what I pushed before (adding in the potential use of a hostfile), but I think this might be good enough for now.

If we find that we keep asking for the other 2 things you mentioned, we can always amend this at any time later (i.e., these pages are not strictly tied to a release).

jsquyres added the documentation label Nov 19, 2022

jsquyres requested review from bwbarrett and jjhursey November 19, 2022 14:15

github-actions bot added the Target: main label Nov 19, 2022

jsquyres force-pushed the pr/docs-getting-help branch from faff9f1 to 4feb02b Compare November 19, 2022 14:57

jjhursey previously approved these changes Nov 21, 2022

View reviewed changes

docs/getting-help.rst Outdated Show resolved Hide resolved

docs/getting-help.rst Outdated Show resolved Hide resolved

docs/getting-help.rst Outdated Show resolved Hide resolved

jsquyres force-pushed the pr/docs-getting-help branch from 4feb02b to 01d926d Compare November 21, 2022 20:07

jjhursey approved these changes Nov 21, 2022

View reviewed changes

docs: update "Getting help"

23a7676

Revamp the "Getting help" docs page to include a bit more information and have more recent procedures for the Open MPI community. Signed-off-by: Jeff Squyres <[email protected]>

jsquyres force-pushed the pr/docs-getting-help branch from 01d926d to 23a7676 Compare November 22, 2022 22:13

jsquyres merged commit 1617398 into open-mpi:main Nov 23, 2022

jsquyres deleted the pr/docs-getting-help branch November 23, 2022 14:24

jsquyres mentioned this pull request Nov 23, 2022

v5.0.x: docs: update "Getting help" #11104

Merged

docs: update "Getting help" #11092

docs: update "Getting help" #11092

Uh oh!

Conversation

jsquyres commented Nov 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

azure-pipelines bot commented Nov 21, 2022

Uh oh!

jjhursey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 21, 2022

Uh oh!

rhc54 commented Nov 21, 2022

Uh oh!

jsquyres commented Nov 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jsquyres commented Nov 19, 2022 •

edited

Loading