-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8304301: Remove the global option SuperWordMaxVectorSize #13112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
openjdk#8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. Currently, SuperWordMaxVectorSize behaves differently between x86 and other platforms[1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware but SVE architecture supports larger than 512 bits. The patch is to drop the global option and use an architecture- dependent interface to consult the max vector size for auto- vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto- vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. So my question is: Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? [1] openjdk#12350 (comment) [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333
|
👋 Welcome back fgao! A progress list of the required criteria for merging this PR into |
|
@fg1417 SuperWordMaxVectorSize defines the maximum vector size generated by the auto vectorization. |
Hi @sviswa7, thanks for your quick response! Yes, the patch keeps the special handling for auto-vectorization on Cascade Lake. For Cascade Lake, even after the patch, we still have 32 bytes for auto-vectorization and larger 64 bytes for Java Vector API and intrinsics. Can it cover your needs? |
sviswa7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
/reviewers 2 |
|
@fg1417 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 51 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@sviswa7 thanks for your kind review! |
|
Hi @vnkozlov @TobiHartmann, could you please help review the patch? Since I don't have sufficient x86 systems, I would appreciate it if you could help verify that the patch will not introduce any performance regression on x86, especially on Cascade Lake. Thanks! |
|
I will test it. |
Correction: only EDIT: I am wrong and you are right. I missed that you added |
|
I am not sure about this effect of UseAVX setting on command line. Why you added |
Hi @vnkozlov, thanks for your review! I added the jdk/src/hotspot/cpu/x86/vm_version_x86.cpp Line 1317 in 91f407d
UseAVX=3 alone would change superword vector size for CL to be MaxVectorSize, but we could still explicitly use -XX:SuperWordMaxVectorSize=32 to limit superword vector size for CL.
Certainly, removing |
I should have looked on existing code. Yes, it also checks I don't have anymore questions. I am currently running specjvm2008 on Cascade Lake with default settings with and without your changes to make sure performance stays the same (I don't know why it would change based on your changes but to be safe). I will let you know results. |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished running specjvm2008 on Cascade Lake. I was first surprise that I got better scores on few sub-benchmarks but after rerunning they were matching. Some kind of instability (TurboBust?, memory channel?). I did use numactl -m 1 -N 1 on dual socket system I have. Anyway, I would say results are matching as expecting.
@TobiHartmann regression testing passed too.
|
Thanks for all your review and testing, @sviswa7 @vnkozlov @TobiHartmann. |
|
/integrate |
|
Going to push as commit 941a7ac.
Your commit was automatically rebased without conflicts. |
#8877 introduced the global option
SuperWordMaxVectorSizeas a temporary solution to fix the performance regression on some x86 machines.Currently,
SuperWordMaxVectorSizebehaves differently between x86 and other platforms [1]. For example, if the current machine only supportsMaxVectorSize <= 32, but we setSuperWordMaxVectorSize = 64, thenSuperWordMaxVectorSizewill be kept at 64 on other platforms while x86 machine would changeSuperWordMaxVectorSizetoMaxVectorSize. Other platforms except x86 miss similar implementations like [2].Also,
SuperWordMaxVectorSizelimits the max vector size of auto-vectorization as64, which is fine for current aarch64 hardware, but SVE architecture supports larger than 512 bits.The patch is to drop the global option and use an architecture-dependent interface to consult the max vector size for auto-vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting
-XX:UseAVX=3or-XX:MaxVectorSize=64on JVM command line.So my question is:
Before the patch, we could have a smaller max vector size for auto-vectorization than
MaxVectorSizeon x86. For example, users could haveMaxVectorSize=64andSuperWordMaxVectorSize=32. But after the change, if we set-XX:MaxVectorSize=64explicitly, then the max vector size for auto-vectorization would beMaxVectorSize, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change?[1] #12350 (comment)
[2]
jdk/src/hotspot/cpu/x86/vm_version_x86.cpp
Lines 1314 to 1333 in 33bec20
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13112/head:pull/13112$ git checkout pull/13112Update a local copy of the PR:
$ git checkout pull/13112$ git pull https://git.openjdk.org/jdk.git pull/13112/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13112View PR using the GUI difftool:
$ git pr show -t 13112Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13112.diff