Skip to content

Conversation

@jsquyres
Copy link
Member

@jsquyres jsquyres commented Nov 19, 2022

Revamp the "Getting help" docs page to include a bit more information and have more recent procedures for the Open MPI community.

Signed-off-by: Jeff Squyres [email protected]

Working with a user recently, I realized that the current "Getting Help" page wasn't quite sufficient for the things we ask for today. This PR revamps the page a bit, requests a few more pieces of information, etc.

For this PR, it might be easier to compare the existing Getting Help page vs. the new Getting Help page than to read the diff.

@jsquyres
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

jjhursey
jjhursey previously approved these changes Nov 21, 2022
Copy link
Member

@jjhursey jjhursey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine as is. One suggestion that you can take or leave.

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

I haven't read the entire entry, but a few suggestions for trying to debug launch problems in case you missed them:

  • configure with --enable-debug to ensure all the diagnostic output is available
  • provide the output from running mpirun --map-by ppr:1:node --prtemca plm_base_verbose 100 --prtemca rmaps_base_verbose 100 --display alloc hostname
  • if you are having problems with binding patterns, provide the HWLOC topology in XML format

Probably a few others you can think of, but those seem to be the ones I at least am repeatedly requesting.

@jsquyres
Copy link
Member Author

@rhc54 Is there a way we could make it so that the user does not have to configure with --enable-debug to get all the relevant user-debugging messages? (I'm guessing that these messages only show up when you enable the corresponding verbose MCA param, right?)

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

Sure - but the negative is that then all of those verbose output calls get executed even when we are not debugging. It used to be a significant hit (since there are a lot of them) as we rendered the output and then discarded it without printing. I seem to recall we changed that not too long ago so that we don't render, which helps reduce the impact - it's nothing more than a quick "if" clause, so perhaps the impact is small now?

@jsquyres
Copy link
Member Author

Is it possible to query the verbosity level and only do the rendering if the verbosity level is > desired_level?

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

😄 I was asking you if we had already done that (since we are just following what you guys have in opal_output), and now you ask me if it is possible

Looking at the PMIx code (since we use it in PRRTE), it appears that we did indeed change it to check the verbosity level first, before doing the rendering. So the impact is pretty minimal.

I'll just change PMIx so it always compiles in the verbose output code as that is the easiest solution. Will require a new release, though, so I don't know how you want to handle that over here.

@jsquyres
Copy link
Member Author

😄 I was asking you if we had already done that (since we are just following what you guys have in opal_output), and now you ask me if it is possible

Haha! FWIW, I was assuming you were doing things like:

if (something_happens) {
    asprintf(&foo, "...", ...);
    OPAL_OUTPUT_VERBOSE(..., foo, ...);
    free(foo);
}

I.e., the foo stuff is what you were referring to as the additional rendering, which therefore wasn't already covered by a check for the verbosity level. Sounds like we are violently agreeing. 😄

I'll just change PMIx so it always compiles in the verbose output code as that is the easiest solution. Will require a new release, though, so I don't know how you want to handle that over here.

Awesome; thanks. We're still advancing PRTE and PMIx git submodules on main/v5.0.x (I just filed #11096 this morning to pick up the fixes you+me talked about over the weekend).

@open-mpi/ompi-rm-5-0-x Do you want to weigh in here?

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

I.e., the foo stuff is what you were referring to as the additional rendering, which therefore wasn't already covered by a check for the verbosity level. Sounds like we are violently agreeing. 😄

Yeah, that's the way you used to have it in opal_output - your capitalized version would call opal_output_generate_string and then check the verbosity to see if it should be printed. Now the macro first checks the verbosity and then generates the string.

If I were more concerned about performance, then having that "if" clause always being executed would be a concern. Nice to have it completely compiled away for performant configurations. However, I can see where it might be useful to always have the debug available. Tough tradeoff.

@jsquyres
Copy link
Member Author

If you're concerned about the "if", you could OPAL_LIKELY or OPAL_UNLIKELY it...?

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

Problem is that "check_verbosity" is a function call:

PMIX_EXPORT bool pmix_output_check_verbosity(int level, int output_id)
{
    return (output_id >= 0 && output_id < PMIX_OUTPUT_MAX_STREAMS
            && info[output_id].ldi_verbose_level >= level);
}

Not sure how you can wrap all that into an LIKELY macro. Might require completely revamping how verbosity is handled - ick.

@jsquyres
Copy link
Member Author

FWIW, that's such a small function that it would be pretty easy to change to either inline or -- since struct opal_output_stream_t is declared and defined in output.h -- a macro.

That being said, I don't know if it's worth it. Without any measurement proof from me, I'm guessing that changing this function to a macro won't have any noticeable impact on overall performance.

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

Thought about that, but I believe that would require making the info array a global - yes? If you inline the check_verbosity function (or make it into a macro), then the info array has to be visible wherever that function gets invoked. Or am I missing something?

@jsquyres
Copy link
Member Author

Yes, it would have to be global. IMHO: if you prefix the symbol appropriately, there's little difference between static-global-in-a-file vs. global-everywhere.

That is a bunch of work to save a single function call (unless you are calling this function a bazillion times, in which case we might want to have a different optimization discussion 😉).

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

That is a bunch of work to save a single function call (unless you are calling this function a bazillion times

Well, it gets called every single time there is an output_verbose in the code path, which tends to be a lot as tracking down problems in things like the mapping code requires a lot of info and the calls typically reside inside extended loops - and sadly, we usually need that info from the user as it is so dependent on their environment. It's why we were using the cap-macro version so much.

I've got it pretty well done now - not as much work as you might think. Annoying, but probably useful in the long run.

@jsquyres
Copy link
Member Author

I've got it pretty well done now - not as much work as you might think. Annoying, but probably useful in the long run.

Roger!

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

openpmix/openpmix#2815 does what you want. Since PRRTE uses the PMIx "output" code, it should immediately translate into PRRTE behavior. I'll likely update the PRRTE code base to directly call the lower-case functions, but continue to retain the updated PMIX_OUTPUT_VERBOSE macro for backward build compatibility (it'll still always be enabled, not just for debug configs).

@jsquyres
Copy link
Member Author

thank-you-2

@jsquyres jsquyres force-pushed the pr/docs-getting-help branch from 4feb02b to 01d926d Compare November 21, 2022 20:07
@jsquyres jsquyres dismissed jjhursey’s stale review November 21, 2022 20:09

Revamped since Josh initially reviewed it

@jsquyres
Copy link
Member Author

@jjhursey @rhc54 I think I incorporated all the suggested changes -- I actually split it into 3 sections now (building+installing, launching, and running). Could you have a look at the new page?

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

You know, instead of having them run our mpirun --map-by ppr:1:node ... command, perhaps we should just have them add the MCA params to their command line? I'm thinking that it usually is some problem with their map/rank/bind combination, so having the output from their specific use-case might be the better choice.

@jsquyres
Copy link
Member Author

You know, instead of having them run our mpirun --map-by ppr:1:node ... command, perhaps we should just have them add the MCA params to their command line? I'm thinking that it usually is some problem with their map/rank/bind combination, so having the output from their specific use-case might be the better choice.

I thought about that, but what if their app requires ppn>1 ?

@rhc54
Copy link
Contributor

rhc54 commented Nov 21, 2022

Yeah, it's a bit of a quandary. Really, we have three parts here:

  • verify that the basic daemon launch is working. The cmd currently listed helps us there
  • verify that they are able to correctly map their job. This requires that they use their cmd line
  • verify that PRRTE accurately and successfully launches the mapped job. This can probably just be combined with the above

Maybe you need separate sections for these? Hate to create more work, but perhaps that would be clearer?

Revamp the "Getting help" docs page to include a bit more information
and have more recent procedures for the Open MPI community.

Signed-off-by: Jeff Squyres <[email protected]>
@jsquyres jsquyres force-pushed the pr/docs-getting-help branch from 01d926d to 23a7676 Compare November 22, 2022 22:13
@jsquyres
Copy link
Member Author

Maybe you need separate sections for these? Hate to create more work, but perhaps that would be clearer?

I think I'm ok leaving it as-is, at least for the moment. I did another push with a trivial change compared to what I pushed before (adding in the potential use of a hostfile), but I think this might be good enough for now.

If we find that we keep asking for the other 2 things you mentioned, we can always amend this at any time later (i.e., these pages are not strictly tied to a release).

@jsquyres jsquyres merged commit 1617398 into open-mpi:main Nov 23, 2022
@jsquyres jsquyres deleted the pr/docs-getting-help branch November 23, 2022 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants