- 
                Notifications
    You must be signed in to change notification settings 
- Fork 929
Closed as not planned
Description
Background information
This is an issue based on the discussion in #11999.
From #11999 it looks like we are missing a release memory barrier somewhere in the code. The problem is solved by adding release semantics to the store in the CAS. However, we generally use relaxed memory ordering in atomic operations so the fix proposed is not the right one.
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
Based on #12005 (port to 4.1.x) this issue seems to be present in 4.1.x and we should assume that it is present in master and 5.0.x as well.
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
Please describe the system on which you are running
- Operating system/version:
- Computer hardware:
- Network type:
Details of the problem
Reproducer:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/types.h>
#include "mpi.h"
#define MAX_THREADS (20)
int g_rankSize = 0;
int g_rank = 0;
MPI_Comm g_comm[MAX_THREADS];
void *mpi_thread(void* p)
{
    int id = *(int*)p;
    free(p);
    int i;
    int count = 0;
    for (i = 0; i < 1000000; ++i) {
        int s = 1;
        int r = 0;
        MPI_Allreduce(&s, &r, 1, MPI_INT, MPI_SUM, g_comm[id]);
        if (r != g_rankSize) {
            count++;
        }
    }
    printf("rank %d id %d error count = %d\n", g_rank, id, count);
    return NULL;
}
int main(int argc, char** argv)
{
    int mpi_threads_provided;
    int req = MPI_THREAD_MULTIPLE;
    pthread_t threads[MAX_THREADS];
    const int threadNum = 10;
    int64_t i;
    MPI_Init_thread(&argc, &argv, req, &mpi_threads_provided);
    MPI_Comm_rank(MPI_COMM_WORLD, &g_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &g_rankSize);
    MPI_Group worldGroup;
    MPI_Comm_group(MPI_COMM_WORLD, &worldGroup);
    for (i = 0; i < threadNum; ++i) {
        MPI_Comm_create(MPI_COMM_WORLD, worldGroup, &g_comm[i]);
    }
    for (i = 0; i < threadNum; ++i) {
        int *p = (int*)malloc(sizeof(int));
        *p = (int)i;
        pthread_create(&threads[i], NULL, mpi_thread, (void*)p);
    }
    for (i = 0; i < threadNum; ++i) {
        pthread_join(threads[i], NULL);
    }
    MPI_Finalize();
    return 0;
}It either yields wrong results or crashes.