Allow fallback to a lesser AVX support during make #8322

bosilca · 2020-12-28T17:28:23Z

Allow more flexibility on the support of AVX* extensions.

Use cached variables to prevent OMPI from checking specific architecture features. As an example this will allow to disable AVX512 while allowing AVX2.
Protect the code against mismatched flags between configure and make.
Document the list of of vectorial functions we need and the corresponding level of AVX, SSE required.

Signed-off-by: George Bosilca [email protected]

ggouaillardet · 2020-12-30T09:15:31Z

@bosilca though the changes look good to me, they are unfortunately insufficient in a brew environment.
The reason is -march=nehalem is used by the superenv (e.g. wrapper) and though all AVX flavors are supported at configure time, none of them are available at make time, and we end up in the case below

#if !defined(PREPEND)
#  if OMPI_MCA_OP_HAVE_AVX512 || OMPI_MCA_OP_HAVE_AVX2
#    error The configure step has detected possible support for AVX512 and/or AVX2 but the compiler flags during make are too restrictive. Please disable the AVX component by adding --enable-mca-no-build=op-avx to your configure step.

should OMPI_MCA_OP_HAVE_AVX be added to the test? (that would not "fix" brew though)

also, even if no AVX flavor is supported at make time, we could still build the SSE flavors (I guess that would be still better than skipping the op/avx component).
op/avx could then be renamed to op/x86_64 (I'm not that pedantic, so I am fine with the avx component name)

I also appended a third commit to give the option to skip avx/sse detection.

In the case of brew, adding ompi_cv_op_avx_check_avx512=no, ompi_cv_op_avx_check_avx2=no and ompi_cv_op_avx_check_avx=no to the environment before invoking configure is enough to successfully build Open MPI.
(note that even if no AVX flavor is supported, SSE flavors are, and the component can be built)

bosilca · 2020-12-31T19:09:41Z

I need to check that we have fully SSE versions of the code, but right now I think we need some basic AVX support for the load/store instructions.

Thus, if we add OMPI_MCA_OP_HAVE_AVX to the test, we will build the AVX component in situations where the entire component is useless and mostly empty (because we will not generate any of the MPI_Op support functions).

ggouaillardet · 2021-01-01T04:18:25Z

@bosilca I see what you mean, and I must admit I did not try to run the SSE only version of the op/avx component.

If only AVX (e.g. sandy bridge aka AVX1) is detected at configure time but is not available at make time,
the error message would be "This file should not be compiled in this conditions." which is less user friendly (e.g. the user should be advised to disable the op/avx component instead of sending us config.log

ggouaillardet · 2021-01-02T12:11:30Z

@bosilca I checked again configure.m4 and there seems to be some confusion here.

AVX512 is correctly handled
The AVX2 really tests AVX ( _mm256_add_ps() and _mm256_loadu_si256() are AVX primitives, __m256i _mm256_add_epi32 (__m256i a, __m256i b) would be a AVX2 primitive not available in AVX.
The AVX really tests SSE (_mm_add_ps() is a SSE primitive, __m256i _mm256_add_epi32 (__m256i a, __m256i b) would be a AVX primitive.

For example, if you configure CFLAGS=-march=nehalem, configure currently states that AVX can be generated without any extra flags. This is wrong, -mavx is needed here otherwise make will fail because the __AVX__ macro is not defined (-march=nehalem is pre-AVX)

I guess you already know this, AVX is basically 256 bits floating point vectors (read no integer vectors), and AVX2 is basically any 256 bits vectors (read floating point and integer vectors).
From an Open MPI point of view, it might be too convoluted to provide optimal support for sandy bridge (AVX but not AVX2 yet) since only floating point is supported by AVX and integer should fall back to a SSE flavor.

Please let me know how you see this. If you agree to drop support for AVX without AVX2, I think I can help.

bosilca · 2021-01-04T06:16:44Z

You are right, we are not really correctly testing for most of the vectorial support, not even for AVX512. More precisely, we check for some flavors of AVX512 but we don't have a single consistent check that reflects how we use the AVX512 instructions. What we should really check for _mm512_add_epi32, _mm512_add_epi8 and _mm256_mullo_epi64 to make sure that AVX512F, AVX512BW and AVX512DQ + AVX512VL are supported simultaneously.

I took a stab at this in the last commit. Looks like a step in the right direction, but still lacks the finesse to provide individual support for AVX512F, AVX512BW and AVX512DQ + AVX512VL.

ggouaillardet · 2021-01-04T08:10:38Z

@bosilca I back-ported this PR to the v4.1.x branch and I was able to build Open MPI under brew without any workaround :-)

Since the op/avx component has caused some issues to package maintainers (brew and conda), maybe we should release 4.1.1 shortly after this PR is back ported.

bosilca · 2021-01-04T14:47:51Z

Excellent, this is great news. Let me squash the PR in 2 commits and then we can move forward. Now that we have reached this positive outcome the RM for the 4.1 (@jsquyres) should make the call for a new stable in the 4.1 field.

jjhursey · 2021-01-04T14:48:36Z

bot:ibm:pgi:retest

jsquyres · 2021-01-04T16:10:02Z

@bosilca After re-reading #8306, I think the proposed solution was not to support having the user pass different flags during configure vs. make, but rather a) to fix the ordering of flags that we (Open MPI) use during configure and make, and b) use slightly different architecture/AVX flags than we previously used.

Is that right?

If so, can the description of this PR be updated to reflect that? Because right now the description implies that this PR will do something that we probably should not support (i.e., user passing different flags at configure vs. make).

bosilca · 2021-01-04T18:32:02Z

This PR is more than the 2 updates you mentionned. @ggouaillardet added the capability to manually skip checks for particular AVX-related features, that would allow the package maintainers to hand-craft their install to avoid colliding with their target architecture. I think Gilles mentioned that this part is not necessary with the last set of commit, but as far as I see it is still in this PR.

jsquyres · 2021-01-04T18:37:07Z

This PR is more than the 2 updates you mentionned.

Fair enough. I just wanted to make sure that we're not actually moving to support using flag set A during configure and flag set B during make.

1. Consistent march flag order between configure and make. 2. op/avx: give the option to skip some tests it is possible to skip some intrinsic tests by setting some environment variables to "no" before invoking configure: - ompi_cv_op_avx_check_avx512 - ompi_cv_op_avx_check_avx2 - ompi_cv_op_avx_check_avx - ompi_cv_op_avx_check_sse41 - ompi_cv_op_avx_check_sse3 3. op/avx: update AVX512 flags try -mavx512f -mavx512bw -mavx512vl -mavx512dq instead of -march=skylake-avx512 since the former is less likely to conflict with user provided CFLAGS (e.g. -march=...) Thanks Bart Oldeman for pointing this. 4. op/avx: have the op/avx library depend on libmpi.so Refs. open-mpi#8323 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: George Bosilca <[email protected]>

1. Allow fallback to a lesser AVX support during make Due to the fact that some distro restrict the compiule architecture during make (while not setting any restrictions during configure) we need to detect the target architecture also during make in order to restrict the code we generate. 2. Add comments and better protect the arch specific code. Identify all the vectorial functions used and clasify them according to the neccesary hardware capabilities. Use these requirements to protect the code for load and stores (the rest of the code being automatically generated it is more difficult to protect). 3. Correctly check for AVX* support. Signed-off-by: George Bosilca <[email protected]>

bosilca · 2021-01-04T18:45:53Z

This is now ready to go. @jsquyres your call if you want a subrelease on the 4.1 stable.

jsquyres · 2021-01-04T18:49:19Z

Yes, we will definitely want this on v4.1.x. Thanks!

jsquyres · 2021-01-04T20:43:40Z

bot:ompi:retest

The test now has the ability to add a shift to all or to any of the input and output buffers to assess the impact of unaligned operations. Signed-off-by: George Bosilca <[email protected]>

isuruf · 2021-01-07T21:56:51Z

I just wanted to make sure that we're not actually moving to support using flag set A during configure and flag set B during make.

As far as I can see, this is now supported in this PR. Maybe that part of the code should be removed?

amckinstry · 2021-01-08T15:49:11Z

So we have problems with the AVX code in v4.1.0 in Debian Unstable.
We see sporadic sigbus and other segfaults (depending on app, architecture) which go away if RDMAV_FORK_SAFE is set
(See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=979041)

There are two suspects:
(a) The module is built with AVX operands on a build system that are not present on the runtime machine.
(b) All the apps that have shown the problem have been python apps. Is this relevant ? perhaps due to fork()ing ?

Is there something a general purpose build in a distro such as Debian should be doing to avoid (a) ?

bosilca · 2021-01-08T17:03:01Z

I can't see anything in the Debian issue pointing to AVX, instead every discussion points to RDMAV_FORK_SAFE not being correctly set. That's an orthogonal discussion to this AVX PR, if needed it should have its own issue.

To answer your question (a), some functions in mca_op_avx.so are compiled with different flavors of AVX enabled (AVX & pre, AVX2 and AVX512) but these function are never called directly. Instead, they are called from a generic function (aka built with the default compiler options), that checks the necessary processor capabilities before calling these functions.

That being said, it is possible we missed some processor capabilities requirements before calling these functions, but we need more specific information to be able to understand what is going on.

rajachan · 2021-01-08T17:13:18Z

@amckinstry I see in that thread the user is running with OFI MTL and the EFA OFI provider. Can you take a look at ofiwg/libfabric#6332?

jsquyres · 2021-01-08T20:54:07Z

@bosilca Any reason not to merge this PR? Are we waiting for anything here?

bosilca · 2021-01-08T21:13:43Z

The regression on lammps has not yet been solved, and at this point we don't know if it is, or how it is related to AVX. Maybe we should wait until @rajachan get a better understanding of what is happening there (following the discussion in #8334).

jsquyres · 2021-01-08T21:23:31Z

Sure -- I understand #8334 is not yet resolved (the perf issue). But this PR is the correctness issue, and is separate from that, right?

bosilca · 2021-01-08T22:09:55Z

I would think this PR is independent, but somehow the use of the AVX module has an undesirable side effect on lammps. Which means that maybe we still have some corner cases that are not yet correctly addressed.

jsquyres · 2021-01-09T12:32:27Z

I would think this PR is independent, but somehow the use of the AVX module has an undesirable side effect on lammps. Which means that maybe we still have some corner cases that are not yet correctly addressed.

Agreed. But AVX is already the default on master, so let's go ahead and get this correctness fix in, and we can work on the AVX performance fix independently.

leofang · 2021-01-11T04:02:52Z

Thanks for resolving it quickly @ggouaillardet @bosilca @jsquyres! Question: will this be backported to the v4.x branch?

jsquyres · 2021-01-11T17:26:46Z

Question: will this be backported to the v4.x branch?

Yes: #8361

bosilca mentioned this pull request Dec 28, 2020

open-mpi 4.1.0 build failure (AVX) #8306

Closed

jsquyres mentioned this pull request Jan 1, 2021

brew: hack for building under brew #8307

Closed

ggouaillardet mentioned this pull request Jan 4, 2021

missing symbol in 4.1.0 (mca_op_avx) #8323

Closed

open-mpi deleted a comment from ibm-ompi Jan 4, 2021

bosilca added 2 commits January 4, 2021 13:41

bosilca force-pushed the topic/portable_avx branch from 63f2951 to fcf2766 Compare January 4, 2021 18:44

jsquyres added bug Target: main labels Jan 4, 2021

rajachan mentioned this pull request Jan 4, 2021

AVX-based MPI_OP performance regression #8334

Open

A better test for MPI_OP performance.

20be3fc

The test now has the ability to add a shift to all or to any of the input and output buffers to assess the impact of unaligned operations. Signed-off-by: George Bosilca <[email protected]>

bosilca requested review from ggouaillardet and rajachan January 9, 2021 17:51

jsquyres approved these changes Jan 10, 2021

View reviewed changes

jsquyres merged commit 8115bd2 into open-mpi:master Jan 10, 2021

bosilca deleted the topic/portable_avx branch January 10, 2021 19:28

leofang added a commit to regro-cf-autotick-bot/openmpi-feedstock that referenced this pull request Jan 11, 2021

apply patch from open-mpi/ompi#8322

9c4e2de

leofang added a commit to regro-cf-autotick-bot/openmpi-feedstock that referenced this pull request Jan 11, 2021

apply patch from open-mpi/ompi#8322

088e152

bosilca mentioned this pull request Jan 11, 2021

Bring the more flexible AVX* support in 4.1 #8361

Merged

Allow fallback to a lesser AVX support during make #8322

Allow fallback to a lesser AVX support during make #8322

Uh oh!

Conversation

bosilca commented Dec 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggouaillardet commented Dec 30, 2020

Uh oh!

bosilca commented Dec 31, 2020

Uh oh!

ggouaillardet commented Jan 1, 2021

Uh oh!

ggouaillardet commented Jan 2, 2021

Uh oh!

bosilca commented Jan 4, 2021

Uh oh!

ggouaillardet commented Jan 4, 2021

Uh oh!

bosilca commented Jan 4, 2021

Uh oh!

jjhursey commented Jan 4, 2021

Uh oh!

jsquyres commented Jan 4, 2021

Uh oh!

bosilca commented Jan 4, 2021

Uh oh!

jsquyres commented Jan 4, 2021

Uh oh!

bosilca commented Jan 4, 2021

Uh oh!

jsquyres commented Jan 4, 2021

Uh oh!

jsquyres commented Jan 4, 2021

Uh oh!

isuruf commented Jan 7, 2021

Uh oh!

amckinstry commented Jan 8, 2021

Uh oh!

bosilca commented Jan 8, 2021

Uh oh!

rajachan commented Jan 8, 2021

Uh oh!

jsquyres commented Jan 8, 2021

Uh oh!

bosilca commented Jan 8, 2021

Uh oh!

jsquyres commented Jan 8, 2021

Uh oh!

bosilca commented Jan 8, 2021

Uh oh!

jsquyres commented Jan 9, 2021

Uh oh!

leofang commented Jan 11, 2021

Uh oh!

jsquyres commented Jan 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

bosilca commented Dec 28, 2020 •

edited

Loading

jsquyres commented Jan 11, 2021 •

edited

Loading