-
Notifications
You must be signed in to change notification settings - Fork 935
Closed
Description
I use Open MPI v4.0.2 from the tar.gz on a cluster running Ubuntu 16.04 with 6 compute nodes (24 cores each). The jobs are managed by slurm.
I have the following error when I try to write some files on disk when running on multiple nodes:
[atlas6:242817] mca_sharedfp_individual_file_open: Error during datafile file open
This error is displayed for each proc of one of the two nodes I use, not the other one.
The problem is that it does not happen all the times I launch the same program.
I could not find a way to reproduce the issue consistently.
Is there anyone that have an idea about this ?