AVX-based MPI_OP performance regression

## Background information

### What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.1.0


### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From source tarball, default configuration built with GCC 4.8.5.


### Please describe the system on which you are running

* Operating system/version:  Amazon Linux
* Computer hardware: Intel Skylake
* Network type: EFA

-----------------------------

## Details of the problem

We noticed an OS-specific regression with LAMMPS (in.chute.scaled case) with 4.1.0. Bisecting through the commits, this seems to have been introduced with the AVX-based MPI_OP changes that got backported into this series. Specifically, the commit which moved to the unaligned SSE memory access primitives for reduce OPs seems to be causing it: https://github.com/open-mpi/ompi/pull/7957

This was added to address the Accumulate issue, so it is a necessary correctness fix (https://github.com/open-mpi/ompi/issues/7954)

The actual PR which introduced the SSE-based MPI_OP in the first place was backported from master: https://github.com/open-mpi/ompi/pull/7935

Broadly, allreduce performance seems to have taken a hit in 4.1.0 compared to 4.0.5 in this environment because of these changes. We do not see this with Amazon Linux 2 (which has a 7.x series GCC) or Ubuntu 18, for instance.

Tried with https://github.com/open-mpi/ompi/pull/8322 just in case, that does not help either. 

@bosilca does anything obvious stand out to you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AVX-based MPI_OP performance regression #8334

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AVX-based MPI_OP performance regression #8334

Description

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Please describe the system on which you are running

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions