Skip to content

AVX-based MPI_OP performance regression #8334

@rajachan

Description

@rajachan

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.1.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

From source tarball, default configuration built with GCC 4.8.5.

Please describe the system on which you are running

  • Operating system/version: Amazon Linux
  • Computer hardware: Intel Skylake
  • Network type: EFA

Details of the problem

We noticed an OS-specific regression with LAMMPS (in.chute.scaled case) with 4.1.0. Bisecting through the commits, this seems to have been introduced with the AVX-based MPI_OP changes that got backported into this series. Specifically, the commit which moved to the unaligned SSE memory access primitives for reduce OPs seems to be causing it: #7957

This was added to address the Accumulate issue, so it is a necessary correctness fix (#7954)

The actual PR which introduced the SSE-based MPI_OP in the first place was backported from master: #7935

Broadly, allreduce performance seems to have taken a hit in 4.1.0 compared to 4.0.5 in this environment because of these changes. We do not see this with Amazon Linux 2 (which has a 7.x series GCC) or Ubuntu 18, for instance.

Tried with #8322 just in case, that does not help either.

@bosilca does anything obvious stand out to you?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions