8304301: Remove the global option SuperWordMaxVectorSize #13112

fg1417 · 2023-03-21T02:26:55Z

#8877 introduced the global option SuperWordMaxVectorSize as a temporary solution to fix the performance regression on some x86 machines.

Currently, SuperWordMaxVectorSize behaves differently between x86 and other platforms [1]. For example, if the current machine only supports MaxVectorSize <= 32, but we set SuperWordMaxVectorSize = 64, then SuperWordMaxVectorSize will be kept at 64 on other platforms while x86 machine would change SuperWordMaxVectorSize to MaxVectorSize. Other platforms except x86 miss similar implementations like [2].

Also, SuperWordMaxVectorSize limits the max vector size of auto-vectorization as 64, which is fine for current aarch64 hardware, but SVE architecture supports larger than 512 bits.

The patch is to drop the global option and use an architecture-dependent interface to consult the max vector size for auto-vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting
-XX:UseAVX=3 or -XX:MaxVectorSize=64 on JVM command line.

So my question is:

Before the patch, we could have a smaller max vector size for auto-vectorization than MaxVectorSize on x86. For example, users could have MaxVectorSize=64 and SuperWordMaxVectorSize=32. But after the change, if we set
-XX:MaxVectorSize=64 explicitly, then the max vector size for auto-vectorization would be MaxVectorSize, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change?

[1] #12350 (comment)
[2]

jdk/src/hotspot/cpu/x86/vm_version_x86.cpp

Lines 1314 to 1333 in 33bec20

    
           #if defined(COMPILER2) 
        
             if (FLAG_IS_DEFAULT(SuperWordMaxVectorSize)) { 
        
               if (FLAG_IS_DEFAULT(UseAVX) && UseAVX > 2 && 
        
                   is_intel_skylake() && _stepping >= 5) { 
        
                 // Limit auto vectorization to 256 bit (32 byte) by default on Cascade Lake 
        
                 FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MIN2(MaxVectorSize, (intx)32)); 
        
               } else { 
        
                 FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MaxVectorSize); 
        
               } 
        
             } else { 
        
               if (SuperWordMaxVectorSize > MaxVectorSize) { 
        
                 warning("SuperWordMaxVectorSize cannot be greater than MaxVectorSize %i", (int) MaxVectorSize); 
        
                 FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MaxVectorSize); 
        
               } 
        
               if (!is_power_of_2(SuperWordMaxVectorSize)) { 
        
                 warning("SuperWordMaxVectorSize must be a power of 2, setting to MaxVectorSize: %i", (int) MaxVectorSize); 
        
                 FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MaxVectorSize); 
        
               } 
        
             } 
        
           #endif

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

JDK-8304301: Remove the global option SuperWordMaxVectorSize

Reviewers

Sandhya Viswanathan (@sviswa7 - Reviewer)
Vladimir Kozlov (@vnkozlov - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13112/head:pull/13112
$ git checkout pull/13112

Update a local copy of the PR:
$ git checkout pull/13112
$ git pull https://git.openjdk.org/jdk.git pull/13112/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13112

View PR using the GUI difftool:
$ git pr show -t 13112

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13112.diff

@sviswa7

openjdk#8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. Currently, SuperWordMaxVectorSize behaves differently between x86 and other platforms[1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware but SVE architecture supports larger than 512 bits. The patch is to drop the global option and use an architecture- dependent interface to consult the max vector size for auto- vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto- vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. So my question is: Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? [1] openjdk#12350 (comment) [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333

bridgekeeper · 2023-03-21T02:27:48Z

👋 Welcome back fgao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-03-21T02:29:55Z

@fg1417 The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-03-21T02:33:40Z

Webrevs

00: Full (49de669c)

sviswa7 · 2023-03-21T04:16:26Z

@fg1417 SuperWordMaxVectorSize defines the maximum vector size generated by the auto vectorization.
MaxVectorSize defines the vector width supported by the underlying platform.
For CascadeLake we are setting SuperWordMaxVectorSize=32 and MaxVectorSize=64 by default.
This allows the usage of larger 64 byte width vector instructions in places like Java Vector API and intrinsics.
We would like to keep this behavior for x86.

fg1417 · 2023-03-21T06:42:56Z

@fg1417 SuperWordMaxVectorSize defines the maximum vector size generated by the auto vectorization. MaxVectorSize defines the vector width supported by the underlying platform. For CascadeLake we are setting SuperWordMaxVectorSize=32 and MaxVectorSize=64 by default. This allows the usage of larger 64 byte width vector instructions in places like Java Vector API and intrinsics. We would like to keep this behavior for x86.

Hi @sviswa7, thanks for your quick response! Yes, the patch keeps the special handling for auto-vectorization on Cascade Lake. For Cascade Lake, even after the patch, we still have 32 bytes for auto-vectorization and larger 64 bytes for Java Vector API and intrinsics. Can it cover your needs?

sviswa7

Looks good to me.

sviswa7 · 2023-03-21T16:51:21Z

/reviewers 2

openjdk · 2023-03-21T16:53:09Z

@fg1417 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8304301: Remove the global option SuperWordMaxVectorSize

Reviewed-by: sviswanathan, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 51 new commits pushed to the master branch:

51035a7: 8294137: Review running times of java.math tests
46cca1a: 4842457: (bf spec) Clarify meaning of "(optional operation)"
6fa25cc: 8184444: The compiler error "variable not initialized in the default constructor" is not apt in case of static final variables
4b8f7db: 8027682: javac wrongly accepts semicolons in package and import decls
c00d088: 8043179: Lambda expression can mutate final field
147f347: 8219083: java/net/MulticastSocket/SetGetNetworkInterfaceTest.java failed in same binary run on windows x64
bf917ba: 8304687: Move add_to_hierarchy
63d4afb: 8304671: javac regression: Compilation with --release 8 fails on underscore in enum identifiers
e2cfcfb: 6817009: Action.SELECTED_KEY not toggled when using key binding
af4d560: 8303951: Add asserts before record_method_not_compilable where possible
... and 41 more: https://git.openjdk.org/jdk/compare/c09f83ec25749af349fb5609e3641b5bb6d34072...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2023-03-21T16:53:40Z

@sviswa7
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

fg1417 · 2023-03-22T05:55:23Z

@sviswa7 thanks for your kind review!

fg1417 · 2023-03-22T06:08:41Z

Hi @vnkozlov @TobiHartmann, could you please help review the patch?

Since I don't have sufficient x86 systems, I would appreciate it if you could help verify that the patch will not introduce any performance regression on x86, especially on Cascade Lake.

Thanks!

vnkozlov · 2023-03-22T06:32:28Z

I will test it.

vnkozlov · 2023-03-22T21:32:20Z

After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting -XX:UseAVX=3 or -XX:MaxVectorSize=64 on JVM command line.

Correction: only -XX:MaxVectorSize=64 will affect it. -XX:UseAVX=3 does not affect superword vector size on CL because the condition for CL already includes UseAVX > 2. And with -XX:UseAVX=2 all vectors are limited to 32 bytes on all platforms.

EDIT: I am wrong and you are right. I missed that you added FLAG_IS_DEFAULT(UseAVX) check. So any UseAVX setting on command line will bypass superword vector size setting for CL.

vnkozlov · 2023-03-22T21:58:09Z

I am not sure about this effect of UseAVX setting on command line.
We use UseAVX setting on command line to emulate runs on different platforms. Some our machines are Cascade Lake. Before, even if we use -XX:UseAVX=3 on command line we still limit superword vector size for CL. With this patch we will not do that. It will not match what happens in production with default setting and we miss testing coverage for CL.

Why you added FLAG_IS_DEFAULT(UseAVX) check?

fg1417 · 2023-03-23T02:01:09Z

I am not sure about this effect of UseAVX setting on command line. We use UseAVX setting on command line to emulate runs on different platforms. Some our machines are Cascade Lake. Before, even if we use -XX:UseAVX=3 on command line we still limit superword vector size for CL. With this patch we will not do that. It will not match what happens in production with default setting and we miss testing coverage for CL.

Why you added FLAG_IS_DEFAULT(UseAVX) check?

Hi @vnkozlov, thanks for your review!

I added the FLAG_IS_DEFAULT(UseAVX) check to follow its original logic, see

jdk/src/hotspot/cpu/x86/vm_version_x86.cpp

Line 1317 in 91f407d

if (FLAG_IS_DEFAULT(UseAVX) && UseAVX > 2 &&

. Before the patch, using UseAVX=3 alone would change superword vector size for CL to be MaxVectorSize, but we could still explicitly use -XX:SuperWordMaxVectorSize=32 to limit superword vector size for CL.

Certainly, removing FLAG_IS_DEFAULT(UseAVX) check here is also reasonable to me. Then only -XX:MaxVectorSize=64 will affect it.

vnkozlov · 2023-03-23T03:07:52Z

I added the FLAG_IS_DEFAULT(UseAVX) check to follow its original logic, see

I should have looked on existing code. Yes, it also checks FLAG_IS_DEFAULT(UseAVX) and you did not change behavior. Good.

I don't have anymore questions. I am currently running specjvm2008 on Cascade Lake with default settings with and without your changes to make sure performance stays the same (I don't know why it would change based on your changes but to be safe). I will let you know results.

vnkozlov

I finished running specjvm2008 on Cascade Lake. I was first surprise that I got better scores on few sub-benchmarks but after rerunning they were matching. Some kind of instability (TurboBust?, memory channel?). I did use numactl -m 1 -N 1 on dual socket system I have. Anyway, I would say results are matching as expecting.

@TobiHartmann regression testing passed too.

fg1417 · 2023-03-24T01:34:40Z

Thanks for all your review and testing, @sviswa7 @vnkozlov @TobiHartmann.

fg1417 · 2023-03-24T07:46:52Z

/integrate

openjdk · 2023-03-24T07:47:57Z

Going to push as commit 941a7ac.
Since your change was applied there have been 59 commits pushed to the master branch:

ac6af6a: 7176515: ExceptionInInitializerError for an enum with multiple switch statements
dd23ee9: 8303917: Update ISO 639 language codes table
6f67abd: 8304557: java/util/concurrent/CompletableFuture/CompletableFutureOrTimeoutExceptionallyTest.java times out
568dd57: 8304716: Clean up G1Policy::calc_max_old_cset_length()
af0504e: 8304691: Remove jlink --post-process-path option
3859faf: 8231349: Move intrinsic stubs generation to compiler runtime initialization code
f37674a: 8304711: Combine G1 root region abort and wait into a single method
7f9e691: 8304712: Only pass total number of regions into G1Policy::calc_min_old_cset_length
51035a7: 8294137: Review running times of java.math tests
46cca1a: 4842457: (bf spec) Clarify meaning of "(optional operation)"
... and 49 more: https://git.openjdk.org/jdk/compare/c09f83ec25749af349fb5609e3641b5bb6d34072...master

Your commit was automatically rebased without conflicts.

openjdk · 2023-03-24T07:48:15Z

@fg1417 Pushed as commit 941a7ac.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk bot added the rfr Pull request is ready for review label Mar 21, 2023

openjdk bot added the hotspot [email protected] label Mar 21, 2023

sviswa7 approved these changes Mar 21, 2023

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Mar 21, 2023

openjdk bot removed the ready Pull request is ready to be integrated label Mar 21, 2023

vnkozlov approved these changes Mar 23, 2023

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Mar 23, 2023

openjdk bot added the integrated Pull request has been integrated label Mar 24, 2023

openjdk bot closed this Mar 24, 2023

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 24, 2023

	#if defined(COMPILER2)
	if (FLAG_IS_DEFAULT(SuperWordMaxVectorSize)) {
	if (FLAG_IS_DEFAULT(UseAVX) && UseAVX > 2 &&
	is_intel_skylake() && _stepping >= 5) {
	// Limit auto vectorization to 256 bit (32 byte) by default on Cascade Lake
	FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MIN2(MaxVectorSize, (intx)32));
	} else {
	FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MaxVectorSize);
	}
	} else {
	if (SuperWordMaxVectorSize > MaxVectorSize) {
	warning("SuperWordMaxVectorSize cannot be greater than MaxVectorSize %i", (int) MaxVectorSize);
	FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MaxVectorSize);
	}
	if (!is_power_of_2(SuperWordMaxVectorSize)) {
	warning("SuperWordMaxVectorSize must be a power of 2, setting to MaxVectorSize: %i", (int) MaxVectorSize);
	FLAG_SET_DEFAULT(SuperWordMaxVectorSize, MaxVectorSize);
	}
	}
	#endif

8304301: Remove the global option SuperWordMaxVectorSize #13112

8304301: Remove the global option SuperWordMaxVectorSize #13112

Uh oh!

Conversation

fg1417 commented Mar 21, 2023 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Mar 21, 2023

Uh oh!

openjdk bot commented Mar 21, 2023

Uh oh!

mlbridge bot commented Mar 21, 2023

Webrevs

Uh oh!

sviswa7 commented Mar 21, 2023

Uh oh!

fg1417 commented Mar 21, 2023

Uh oh!

sviswa7 left a comment

Choose a reason for hiding this comment

Uh oh!

sviswa7 commented Mar 21, 2023

Uh oh!

openjdk bot commented Mar 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Mar 21, 2023

Uh oh!

fg1417 commented Mar 22, 2023

Uh oh!

fg1417 commented Mar 22, 2023

Uh oh!

vnkozlov commented Mar 22, 2023

Uh oh!

vnkozlov commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vnkozlov commented Mar 22, 2023

Uh oh!

fg1417 commented Mar 23, 2023

Uh oh!

vnkozlov commented Mar 23, 2023

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

fg1417 commented Mar 24, 2023

Uh oh!

fg1417 commented Mar 24, 2023

Uh oh!

openjdk bot commented Mar 24, 2023

Uh oh!

openjdk bot commented Mar 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

fg1417 commented Mar 21, 2023 •

edited by openjdk bot

Loading

openjdk bot commented Mar 21, 2023 •

edited

Loading

vnkozlov commented Mar 22, 2023 •

edited

Loading