-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8287697: Limit auto vectorization to 32-byte vector on Cascade Lake #8877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into |
|
You have trailing white spaces. |
| if (is_java_primitive(bt) && | ||
| (vlen > 1) && is_power_of_2(vlen) && | ||
| Matcher::vector_size_supported(bt, vlen)) { | ||
| Matcher::vector_size_supported(bt, vlen) && | ||
| (vlen * type2aelembytes(bt) <= SuperWordMaxVectorSize)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this whole condition into separate static bool VectorNode::vector_size_supported(vlen, bt) and use in both cases?
|
Vectorization through SLP can be controlled by constraining MaxVectorSize and through Vector APIs using narrower SPECIES. |
|
/label hotspot-compiler |
|
@sviswa7 |
|
@vnkozlov Your review comments are resolved. |
Webrevs
|
|
I think we missed the test with setting That may be preferable "simple fix" vs suggested changes for "short term solution". The objection was that user may still want to use wide 64 bytes vectors for Vector API. But I agree with Jatin argument about that. BTW, |
|
@vnkozlov I have made SuperWordMaxVectorSize as a develop option as you suggested. As far as I know, the only intrinsics/stubs that uses MaxVectorSize are for clear/copy. This is done in conjunction with AVX3Threshold so we are ok there for Cascade Lake. |
Thank you for checking stubs code. We still have to run performance testing with this patch. We need only additional run with And I want @jatin-bhateja to approve this change too. Or give better suggestion. |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. I will start testing it.
| FLAG_SET_DEFAULT(SuperWordMaxVectorSize, 32); | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SuperWordMaxVectorSize is set to 32 bytes by default, it should still be capped by MaxVectorSize, in case user sets MaxVectorSize to 16 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I submitted testing with FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MIN2(MaxVectorSize, (intx)32));
And the flag declared as DIAGNOSTIC - product build fail otherwise.
| "actual size could be less depending on elements type") \ | ||
| range(0, max_jint) \ | ||
| \ | ||
| develop(intx, SuperWordMaxVectorSize, 64, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag can't be develop because it is used in product code. It should be diagnostic.
Hi @sviswa7 . This looks reasonable since stubs and some macro assembly routines anyways operate under thresholds and does not strictly comply with max vector size. |
| if (use_avx_limit > 2 && is_intel_skylake() && _stepping < 5) { | ||
| FLAG_SET_DEFAULT(UseAVX, 2); | ||
| if (use_avx_limit > 2 && is_intel_skylake()) { | ||
| if (_stepping < 5) { | ||
| FLAG_SET_DEFAULT(UseAVX, 2); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this change for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some changes in this area before. This is an artifact of that. I will set it back to exactly as it was.
|
|
||
| if (FLAG_IS_DEFAULT(SuperWordMaxVectorSize)) { | ||
| if (FLAG_IS_DEFAULT(UseAVX) && UseAVX > 2 && | ||
| is_intel_skylake() && _stepping > 5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you check _stepping >= 5? Otherwise _stepping == 5 is missing in all adjustments.
Maybe. But it would require more careful changes. And that changeset is not integrated yet. And, as Jatin and Sandhya said, we may do proper fix after JDK 19 fork. Then we can look on your proposal. |
|
@vnkozlov @jatin-bhateja Your review comments are implemented. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Please wait until regression and performance testing are finished. I will let you know results.
|
@sviswa7 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 38 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
Thanks @sviswa7 , changes looks good to me. |
|
Regression testing results are good. Waiting performance results. |
|
jbb2015 are only left in queue for performance testing. It may take time and I don't expect much variations in them. Testing also include Both changes gives 4-5% improvement to But I also observed 2.7% regression in |
|
@sviswa7, please, file separate RFE for SPECjvm2008-SOR.small issue (different unrolling factor). I got all performance data from our and yours data and I think these change are ready for integration. Thanks! |
|
@vnkozlov Thanks a lot. I have filed the RFE: https://bugs.openjdk.org/browse/JDK-8287966. |
|
/integrate |
|
Going to push as commit 45f1b72.
Your commit was automatically rebased without conflicts. |
openjdk#8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. Currently, SuperWordMaxVectorSize behaves differently between x86 and other platforms[1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware but SVE architecture supports larger than 512 bits. The patch is to drop the global option and use an architecture- dependent interface to consult the max vector size for auto- vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto- vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. So my question is: Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? [1] openjdk#12350 (comment) [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333
We observe ~20% regression in SPECjvm2008 mpegaudio sub benchmark on Cascade Lake with Default vs -XX:UseAVX=2.
The performance of all the other non-startup sub benchmarks of SPECjvm2008 is within +/- 5%.
The performance regression is due to auto-vectorization of small loops.
We don’t have AVX3Threshold consideration in auto-vectorization.
The performance regression in mpegaudio can be recovered by limiting auto-vectorization to 32-byte vectors.
This PR limits auto-vectorization to 32-byte vectors by default on Cascade Lake. Users can override this by either setting -XX:UseAVX=3 or -XX:SuperWordMaxVectorSize=64 on JVM command line.
Please review.
Best Regard,
Sandhya
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8877/head:pull/8877$ git checkout pull/8877Update a local copy of the PR:
$ git checkout pull/8877$ git pull https://git.openjdk.java.net/jdk pull/8877/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8877View PR using the GUI difftool:
$ git pr show -t 8877Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8877.diff