Skip to content

Error when writing file on more than one node #7429

@romainhild

Description

@romainhild

I use Open MPI v4.0.2 from the tar.gz on a cluster running Ubuntu 16.04 with 6 compute nodes (24 cores each). The jobs are managed by slurm.

I have the following error when I try to write some files on disk when running on multiple nodes:

[atlas6:242817] mca_sharedfp_individual_file_open: Error during datafile file open

This error is displayed for each proc of one of the two nodes I use, not the other one.

The problem is that it does not happen all the times I launch the same program.
I could not find a way to reproduce the issue consistently.

Is there anyone that have an idea about this ?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions