Skip to content

MPI_Win_allocate() fails when force to use RDMA #9580

@joaobfernandes0

Description

@joaobfernandes0

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

I did not make the installation. I'm trying to get this information from who did it.

Please describe the system on which you are running

  • Operational System: Red Hat 4.4.7-23
  • Cluster with slurm 15.08.7
  • Computer Hardware: Intel(R) Xeon(R) CPU E5-2683 v4
  • Network type: Infiniband

Details of the problem

I'm trying to run a program with repetitive communication in ring type, as the example test.cpp . When I run the example on my cluster using multiple nodes with the command salloc -N2 --hint=compute_bound --exclusive mpirun test.o (because I am using slurm) the output is similar to

ID 0 Time 6.001007
ID 1 Time 6.001102

However, I was waiting times of 3.0 and 6.0 seconds approximately. This wrong behavior in general happened when I am not using RDMA. Then, I decided to force the program to use the rdma using the command salloc -N2 --hint=compute_bound --exclusive mpirun --mca osc rdma test.o. However, I've received the following error

[r1i1n10:08761] *** An error occurred in MPI_Win_allocate
[r1i1n10:08761] *** reported by process [418054145,1]
[r1i1n10:08761] *** on communicator MPI_COMM_WORLD
[r1i1n10:08761] *** MPI_ERR_WIN: invalid window
[r1i1n10:08761] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[r1i1n10:08761] *** and potentially your MPI job)
[service0:31286] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[service0:31286] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

I did not expect this because I have an Infiniband on the cluster. The first case already is strange, the second I do not understand the error.

Observation

  • The program runs as expected when it is used a single node with multiple processes using or not --mca osc rdma.
  • One time ago, I ran the same code and it worked. I've not noticed any change in the code or environment to now.

test.cpp

#include <iostream>
#include <unistd.h>
#include <mpi.h>
#include <math.h>

int main(int argc, char *argv[])
{
  MPI_Win window;
  int id, comm_sz;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &id);
  MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
  
  int get_number;
  int next = (id+1)%comm_sz;

  double t;
  int *window_buffer;
  
  MPI_Win_allocate(sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &(window_buffer), &(window));

  t = MPI_Wtime();
  for (int i = 0; i < 3; i++) {
    sleep(id+1);
    MPI_Win_lock(MPI_LOCK_SHARED, next, 0, window);
    MPI_Get(&get_number, 1, MPI_INT, next, 0, 1, MPI_INT, window);
    MPI_Win_unlock(next, window);
  }
  printf("ID %i Time %lf\n", id, MPI_Wtime()-t);

  MPI_Finalize();
  return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions