Skip to content

Conversation

@wenduwan
Copy link
Contributor

@wenduwan wenduwan commented Mar 6, 2024

This is a manual cherry-pick of #12388

According to libfabric API, fi_cq_readerr also reports errors for requests that did not require completion, and associate a null context with the error entry. This patch adds null check and bail gracefully as to avoid invalid memory access.

bot:notacherrypick

According to libfabric API, fi_cq_readerr also reports errors for
requests that did not require completion, and associate a null context
with the error entry. This patch adds null check and bail gracefully as
to avoid invalid memory access.

bot:notacherrypick

Signed-off-by: Wenduo Wang <[email protected]>
@wenduwan wenduwan requested review from bwbarrett and jsquyres March 6, 2024 15:39
@wenduwan wenduwan self-assigned this Mar 6, 2024
@github-actions github-actions bot added this to the v4.1.7 milestone Mar 6, 2024
}

if (!error.op_context) {
opal_output(0, "%s:%d: Error returned from fi_cq_readerr with null context. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these opal_output() calls really should be show_help calls -- show_help is de-duplicated at mpirun.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this error will cause a lot of duplication - it should be rather rare and isolated.

So what is the intended use case for opal_output?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think show_help is needed here. Certainly wouldn't be wrong, but the current code base (including this PR) is full of existing calls to opal_output() for errors that are unlikely to be correlated across nodes.

@bwbarrett bwbarrett merged commit 6a7ee37 into open-mpi:v4.1.x Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants