-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases #25793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…se cases If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant. And this conversion also enables further optimizations that recognize maskAll patterns, see [1]. Some JTReg test cases are added to ensure the optimization is effective. I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64. The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed. [1] openjdk#24674
|
👋 Welcome back erifan! A progress list of the required criteria for merging this PR into |
|
@erifan This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 182 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@XiaohongGong, @jatin-bhateja) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Webrevs
|
Add support for the following patterns: toLong(maskAll(true)) => (-1ULL >> (64 -vlen)) toLong(maskAll(false)) => 0 And add more test cases.
erifan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review! Would you mind taking another look, thanks!
XiaohongGong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better to me. Thanks for your updating!
|
Hi @eme64 @jatin-bhateja , could you help review this PR? Thanks~ |
|
Can you kindly include a micro with this patch? Your patch now removes L2M and M2L IR nodes. |
With this JMH method we can not see obvious performance improvement, because the hot spots are other instructions. Adding a loop is better. But if it is not a floating point type, there will be no obvious performance improvement. Because the pattern @jatin-bhateja Thanks for your review! |
There is no hard and fast rule for the inclusion of a loop in a JMH micro in that case? |
You mean adding a loop is not a block, right ? |
Yes. If you see gains without loop go for it. |
Do the convertion in C2's IGVN phase to cover more cases.
|
As @jatin-bhateja suggested, I have refactored the implementation and updated the commit message, please help review this PR, thanks! |
Thanks a lot @erifan , I am out for the rest of the week, will re-review early next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest of the patch looks good to me, apart from minor proposed changes
test/micro/org/openjdk/bench/jdk/incubator/vector/MaskFromLongToLongBenchmark.java
Show resolved
Hide resolved
erifan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, I'll update the code soon.
test/micro/org/openjdk/bench/jdk/incubator/vector/MaskFromLongToLongBenchmark.java
Show resolved
Hide resolved
|
Testing is currently slow - still running but I report what I have so far. There is one test failure on Additional flags: Log``` Compilations (5) of Failed Methods (5) -------------------------------------- 1) Compilation of "public static void compiler.vectorapi.VectorMaskToLongTest.testFromLongToLongByte()": > Phase "PrintIdeal": AFTER: print_ideal 0 Root === 0 203 230 269 270 [[ 0 1 3 225 198 189 23 186 167 28 44 124 104 54 277 295 ]] inner 1 Con === 0 [[ ]] #top 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address} 5 Parm === 3 [[ 168 ]] Control !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:-1 (line 182) 6 Parm === 3 [[ 168 ]] I_O !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:-1 (line 182) 7 Parm === 3 [[ 168 279 286 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:-1 (line 182) 8 Parm === 3 [[ 270 269 255 233 230 199 203 168 226 ]] FramePtr !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:-1 (line 182) 9 Parm === 3 [[ 270 269 199 226 ]] ReturnAdr !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:-1 (line 182) 23 ConP === 0 [[ 255 168 ]] #jdk/incubator/vector/ByteVector$ByteSpecies (jdk/incubator/vector/VectorSpecies):exact * Oop:jdk/incubator/vector/ByteVector$ByteSpecies (jdk/incubator/vector/VectorSpecies):exact * 28 ConI === 0 [[ 168 ]] #int:1 44 ConI === 0 [[ 168 ]] #int:16 54 ConL === 0 [[ 255 233 168 168 199 ]] #long:65534 104 ConP === 0 [[ 168 ]] #java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact * Oop:java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact * 124 ConP === 0 [[ 168 ]] #java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact * Oop:java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact * 167 ConP === 0 [[ 168 ]] #jdk/incubator/vector/VectorMask$$Lambda+0x0000060001060748 (jdk/internal/vm/vector/VectorSupport$FromBitsCoercedOperation):exact * Oop:jdk/incubator/vector/VectorMask$$Lambda+0x0000060001060748 (jdk/internal/vm/vector/VectorSupport$FromBitsCoercedOperation):exact * 168 CallStaticJava === 5 6 7 8 1 (104 124 44 54 1 28 23 167 54 1 1 1 1 1 1 1 ) [[ 169 181 182 173 ]] # Static jdk.internal.vm.vector.VectorSupport::fromBitsCoerced jdk/internal/vm/vector/VectorSupport$VectorPayload * ( java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact *, java/lang/Class (java/io/Serializable,java/lang/constant/Constable,java/lang/reflect/AnnotatedElement,java/lang/invoke/TypeDescriptor,java/lang/reflect/GenericDeclaration,java/lang/reflect/Type,java/lang/invoke/TypeDescriptor$OfField):exact *, int, long, half, int, jdk/internal/vm/vector/VectorSupport$VectorSpecies *, java/lang/Object * ) VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 169 Proj === 168 [[ 175 ]] #0 !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 173 Proj === 168 [[ 226 190 278 278 222 ]] #5 Oop:jdk/internal/vm/vector/VectorSupport$VectorPayload * !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 175 Catch === 169 181 [[ 176 177 ]] !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 176 CatchProj === 175 [[ 192 ]] #0@bci -1 !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 177 CatchProj === 175 [[ 251 180 ]] #1@bci -1 !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 180 CreateEx === 177 181 [[ 254 ]] #java/lang/Throwable (java/io/Serializable):NotNull * Oop:java/lang/Throwable (java/io/Serializable):NotNull * !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 181 Proj === 168 [[ 233 199 226 252 175 180 ]] #1 !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 182 Proj === 168 [[ 233 226 199 253 ]] #2 Memory: @BotPTR *+bot, idx=Bot; !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 186 ConP === 0 [[ 296 ]] #precise jdk/incubator/vector/VectorMask: 0x0000000148423080:Constant:exact * Klass:precise jdk/incubator/vector/VectorMask: 0x0000000148423080:Constant:exact * 189 ConP === 0 [[ 190 199 ]] #null 190 CmpP === _ 173 189 [[ 191 ]] !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 191 Bool === _ 190 [[ 192 ]] [ne] !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 192 If === 176 191 [[ 193 194 ]] P=0.999999, C=-1.000000 !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 193 IfFalse === 192 [[ 199 ]] #0 !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 194 IfTrue === 192 [[ 299 279 ]] #1 !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 198 ConI === 0 [[ 199 ]] #int:-12 199 CallStaticJava === 193 181 182 8 9 (198 54 1 1 1 1 1 1 1 189 ) [[ 200 ]] # Static uncommon_trap(reason='null_check' action='make_not_entrant' debug_id='0') void ( int ) C=0.000100 VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 200 Proj === 199 [[ 203 ]] #0 !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 203 Halt === 200 1 1 8 1 [[ 0 ]] !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 222 CheckCastPP === 300 173 [[ 233 ]] #jdk/incubator/vector/VectorMask:NotNull * Oop:jdk/incubator/vector/VectorMask:NotNull * !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 225 ConI === 0 [[ 226 ]] #int:-34 226 CallStaticJava === 301 181 182 8 9 (225 1 1 1 1 1 1 1 1 173 ) [[ 227 ]] # Static uncommon_trap(reason='class_check' action='maybe_recompile' debug_id='0') void ( int ) C=0.000100 VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 227 Proj === 226 [[ 230 ]] #0 !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 230 Halt === 227 1 1 8 1 [[ 0 ]] !jvms: VectorMask::fromLong @ bci:42 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 233 CallDynamicJava === 300 181 182 8 1 (222 54 1 1 1 ) [[ 234 246 247 238 ]] # Dynamic jdk.incubator.vector.VectorMask::toLong long/half ( jdk/incubator/vector/VectorMask:NotNull * ) VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 234 Proj === 233 [[ 240 ]] #0 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 238 Proj === 233 [[ 255 ]] #5 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 240 Catch === 234 246 [[ 241 242 ]] !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 241 CatchProj === 240 [[ 255 ]] #0@bci -1 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 242 CatchProj === 240 [[ 251 245 ]] #1@bci -1 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 245 CreateEx === 242 246 [[ 254 ]] #java/lang/Throwable (java/io/Serializable):NotNull * Oop:java/lang/Throwable (java/io/Serializable):NotNull * !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 246 Proj === 233 [[ 255 252 240 245 ]] #1 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 247 Proj === 233 [[ 255 253 ]] #2 Memory: @BotPTR *+bot, idx=Bot; !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 251 Region === 251 177 242 263 [[ 251 252 253 254 270 ]] !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 252 Phi === 251 181 246 257 [[ 270 ]] #abIO !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 253 Phi === 251 182 247 258 [[ 270 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 254 Phi === 251 180 245 266 [[ 270 ]] #java/lang/Throwable (java/io/Serializable):NotNull * Oop:java/lang/Throwable (java/io/Serializable):NotNull * !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:25 (line 183) 255 CallStaticJava === 241 246 247 8 1 (23 54 1 238 1 1 1 1 1 ) [[ 256 257 258 ]] # Static compiler.vectorapi.VectorMaskToLongTest::verifyMaskToLong void ( java/lang/Object *, long, half, long, half ) VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) 256 Proj === 255 [[ 261 ]] #0 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) 257 Proj === 255 [[ 269 261 252 266 ]] #1 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) 258 Proj === 255 [[ 269 253 ]] #2 Memory: @BotPTR *+bot, idx=Bot; !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) 261 Catch === 256 257 [[ 262 263 ]] !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) 262 CatchProj === 261 [[ 269 ]] #0@bci -1 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) 263 CatchProj === 261 [[ 251 266 ]] #1@bci -1 !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) 266 CreateEx === 263 257 [[ 254 ]] #java/lang/Throwable (java/io/Serializable):NotNull * Oop:java/lang/Throwable (java/io/Serializable):NotNull * !jvms: VectorMaskToLongTest::testFromLongToLongByte @ bci:34 (line 184) 269 Return === 262 257 258 8 9 [[ 0 ]] 270 Rethrow === 251 252 253 8 9 exception 254 [[ 0 ]] 277 ConL === 0 [[ 278 ]] #long:8 278 AddP === _ 173 173 277 [[ 279 ]] Oop:jdk/internal/vm/vector/VectorSupport$VectorPayload+8 * [narrowklass] !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 279 LoadNKlass === 194 7 278 [[ 280 ]] @java/lang/Object+8 * [narrowklass], idx=5; #narrowklass: jdk/internal/vm/vector/VectorSupport$VectorPayload: 0x000000014800e0e8 * !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 280 DecodeNKlass === _ 279 [[ 285 285 ]] #jdk/internal/vm/vector/VectorSupport$VectorPayload: 0x000000014800e0e8 * Klass:jdk/internal/vm/vector/VectorSupport$VectorPayload: 0x000000014800e0e8 * !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 285 AddP === _ 280 280 295 [[ 286 ]] Klass:jdk/internal/vm/vector/VectorSupport$VectorPayload: 0x000000014800e0e8+80 * !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 286 LoadKlass === _ 7 285 [[ 296 ]] @java/lang/Object: 0x000000014800a050+any *, idx=6; # * Klass: * !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 295 ConL === 0 [[ 285 ]] #long:80 296 CmpP === _ 286 186 [[ 298 ]] !orig=[289] !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 298 Bool === _ 296 [[ 299 ]] [ne] !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 299 If === 194 298 [[ 300 301 ]] P=0.170000, C=-1.000000 !orig=[291] !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 300 IfFalse === 299 [[ 222 233 ]] #0 !orig=[292],[273],[220] !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183) 301 IfTrue === 299 [[ 226 ]] #1 !orig=[293],[274],[221] !jvms: VectorMask::fromLong @ bci:39 (line 243) VectorMaskToLongTest::testFromLongToLongByte @ bci:22 (line 183)
[...] One or more @ir rules failed: Failed IR Rules (5) of Methods (5)
|
Thanks @chhagedorn , and yes you are right. I can reproduce the failure with |
|
Hi @chhagedorn , I have increased the warm up times, could you help test the PR again ? Thanks! |
|
I think there are a few (follow-up?) improvements that can be made:
|
Nice, that's good to hear!
Thanks for coming back with a fix! I'll resubmit testing and report back again. |
|
Hi @SirYwell thanks for your suggestions. But I'm not quite understand what you meant, can you elaborate? |
Constants are the limiting case of KnownBits where all the bits are known, i.e., KnownBits.ZEROS | Known.Bits.ONES = -1, since the pattern check is especially over -1 / 0 constant values, hence what we have currently looks reasonable, though we may not need to check all the bits of long for narrower vectors. Here is one example of the KnownBits application for bit compression/expansion. |
@erifan for my first point, knowing that the lower n bits are all 0 or all 1 is enough, i.e., whether For the second one, if I hacked something together to clarify what I mean: SirYwell@02e13a4 Please let me know if there's still something unclear. (That said I'm completely fine with the PR as-is, especially as the KnownBits part is hard to test right now.) |
|
@SirYwell, thanks for your explanation, now I got your points. It's a good idea, with your suggestions, this optimization may apply to more cases. As you said, the KnownBits part is hard to test right now, so that's it for now. |
|
Testing looked good! |
XiaohongGong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM!
Thanks~ |
jatin-bhateja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Best Regards
|
/integrate |
|
/sponsor |
|
Going to push as commit f40381e.
Your commit was automatically rebased without conflicts. |
|
@jatin-bhateja @erifan Pushed as commit f40381e. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
If the input long value
lofVectorMask.fromLong(SPECIES, l)would set or unset all lanes,VectorMask.fromLong(SPECIES, l)is equivalent tomaskAll(true)ormaskAll(false). But the cost of themaskAllisrelative smaller than that of
fromLong. So this patch does the conversion for these cases.The conversion is done in C2's IGVN phase. And on platforms (like Arm NEON) that don't support
VectorLongToMask, the conversion is done during intrinsiication process ifMaskAllorReplicateis supported.Since this optimization requires the input long value of
VectorMask.fromLongto be specific compile-time constants, and such expressions are usually hoisted out of the loop. So we can't see noticeable performance change.This conversion also enables further optimizations that recognize maskAll patterns, see [1]. And we can observe a performance improvement of about 7% on both aarch64 and x64.
As
VectorLongToMaskis converted toMaskAllorReplicate, some existing optimizations recognizing theVectorLongToMaskwill be affected, likeHence, this patch also added the following optimizations:
And we can see noticeable performance improvement with the above optimizations for floating-point types.
Benchmarks on Nvidia Grace machine with option
-XX:UseSVE=2:Benchmarks on AMD EPYC 9124 16-Core Processor with option
-XX:UseAVX=3:There's no obvious performance changes for integers types because the optimization
VectorMaskToLong (VectorLongToMask x) => xhas supported integers types before.Some JTReg test cases are added for the above changes. And the patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed.
[1] #24674
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25793/head:pull/25793$ git checkout pull/25793Update a local copy of the PR:
$ git checkout pull/25793$ git pull https://git.openjdk.org/jdk.git pull/25793/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25793View PR using the GUI difftool:
$ git pr show -t 25793Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25793.diff
Using Webrev
Link to Webrev Comment