-
Notifications
You must be signed in to change notification settings - Fork 935
Allow fallback to a lesser AVX support during make #8322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bosilca though the changes look good to me, they are unfortunately insufficient in a should also, even if no AVX flavor is supported at I also appended a third commit to give the option to skip avx/sse detection. In the case of |
|
I need to check that we have fully SSE versions of the code, but right now I think we need some basic AVX support for the load/store instructions. Thus, if we add OMPI_MCA_OP_HAVE_AVX to the test, we will build the AVX component in situations where the entire component is useless and mostly empty (because we will not generate any of the MPI_Op support functions). |
|
@bosilca I see what you mean, and I must admit I did not try to run the SSE only version of the If only AVX (e.g. sandy bridge aka AVX1) is detected at |
|
@bosilca I checked again
For example, if you I guess you already know this, Please let me know how you see this. If you agree to drop support for |
|
You are right, we are not really correctly testing for most of the vectorial support, not even for AVX512. More precisely, we check for some flavors of AVX512 but we don't have a single consistent check that reflects how we use the AVX512 instructions. What we should really check for I took a stab at this in the last commit. Looks like a step in the right direction, but still lacks the finesse to provide individual support for |
|
@bosilca I back-ported this PR to the Since the |
|
Excellent, this is great news. Let me squash the PR in 2 commits and then we can move forward. Now that we have reached this positive outcome the RM for the 4.1 (@jsquyres) should make the call for a new stable in the 4.1 field. |
|
bot:ibm:pgi:retest |
|
@bosilca After re-reading #8306, I think the proposed solution was not to support having the user pass different flags during Is that right? If so, can the description of this PR be updated to reflect that? Because right now the description implies that this PR will do something that we probably should not support (i.e., user passing different flags at |
|
This PR is more than the 2 updates you mentionned. @ggouaillardet added the capability to manually skip checks for particular AVX-related features, that would allow the package maintainers to hand-craft their install to avoid colliding with their target architecture. I think Gilles mentioned that this part is not necessary with the last set of commit, but as far as I see it is still in this PR. |
Fair enough. I just wanted to make sure that we're not actually moving to support using flag set A during |
1. Consistent march flag order between configure and make. 2. op/avx: give the option to skip some tests it is possible to skip some intrinsic tests by setting some environment variables to "no" before invoking configure: - ompi_cv_op_avx_check_avx512 - ompi_cv_op_avx_check_avx2 - ompi_cv_op_avx_check_avx - ompi_cv_op_avx_check_sse41 - ompi_cv_op_avx_check_sse3 3. op/avx: update AVX512 flags try -mavx512f -mavx512bw -mavx512vl -mavx512dq instead of -march=skylake-avx512 since the former is less likely to conflict with user provided CFLAGS (e.g. -march=...) Thanks Bart Oldeman for pointing this. 4. op/avx: have the op/avx library depend on libmpi.so Refs. open-mpi#8323 Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: George Bosilca <[email protected]>
1. Allow fallback to a lesser AVX support during make Due to the fact that some distro restrict the compiule architecture during make (while not setting any restrictions during configure) we need to detect the target architecture also during make in order to restrict the code we generate. 2. Add comments and better protect the arch specific code. Identify all the vectorial functions used and clasify them according to the neccesary hardware capabilities. Use these requirements to protect the code for load and stores (the rest of the code being automatically generated it is more difficult to protect). 3. Correctly check for AVX* support. Signed-off-by: George Bosilca <[email protected]>
63f2951 to
fcf2766
Compare
|
This is now ready to go. @jsquyres your call if you want a subrelease on the 4.1 stable. |
|
Yes, we will definitely want this on v4.1.x. Thanks! |
|
bot:ompi:retest |
The test now has the ability to add a shift to all or to any of the input and output buffers to assess the impact of unaligned operations. Signed-off-by: George Bosilca <[email protected]>
As far as I can see, this is now supported in this PR. Maybe that part of the code should be removed? |
|
So we have problems with the AVX code in v4.1.0 in Debian Unstable. There are two suspects: Is there something a general purpose build in a distro such as Debian should be doing to avoid (a) ? |
|
I can't see anything in the Debian issue pointing to AVX, instead every discussion points to RDMAV_FORK_SAFE not being correctly set. That's an orthogonal discussion to this AVX PR, if needed it should have its own issue. To answer your question (a), some functions in mca_op_avx.so are compiled with different flavors of AVX enabled (AVX & pre, AVX2 and AVX512) but these function are never called directly. Instead, they are called from a generic function (aka built with the default compiler options), that checks the necessary processor capabilities before calling these functions. That being said, it is possible we missed some processor capabilities requirements before calling these functions, but we need more specific information to be able to understand what is going on. |
|
@amckinstry I see in that thread the user is running with OFI MTL and the EFA OFI provider. Can you take a look at ofiwg/libfabric#6332? |
|
@bosilca Any reason not to merge this PR? Are we waiting for anything here? |
|
Sure -- I understand #8334 is not yet resolved (the perf issue). But this PR is the correctness issue, and is separate from that, right? |
|
I would think this PR is independent, but somehow the use of the AVX module has an undesirable side effect on lammps. Which means that maybe we still have some corner cases that are not yet correctly addressed. |
Agreed. But AVX is already the default on master, so let's go ahead and get this correctness fix in, and we can work on the AVX performance fix independently. |
|
Thanks for resolving it quickly @ggouaillardet @bosilca @jsquyres! Question: will this be backported to the v4.x branch? |
Yes: #8361 |
Allow more flexibility on the support of AVX* extensions.
Signed-off-by: George Bosilca [email protected]