-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8354242: VectorAPI: combine vector not operation with compare #24674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch optimizes the following patterns:
For integer types:
```
(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
=> (VectorMaskCmp src1 src2 ncond)
(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
=> (VectorMaskCmp src1 src2 ncond)
```
cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
negative comparison of cond.
For float and double types:
```
(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
=> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
```
cond can be eq or ne.
Benchmarks on Nvidia Grace machine with 128-bit SVE2:
With option `-XX:UseSVE=2`:
```
Benchmark Unit Before Score Error After Score Error Uplift
testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29
testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33
testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33
testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31
testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31
testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41
testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29
testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39
testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37
testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41
testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 1024506.58 9774.75138 1.29
testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4
testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38
testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41
testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29
testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4
testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38
testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41
testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29
testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4
testCompareLTMaskNotLong ops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37
testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41
testCompareNEMaskNotByte ops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29
testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36
testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.1401 1.36
testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4
testCompareNEMaskNotLong ops/s 836327.6835 410.525466 1178178.825 308.757932 1.4
testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41
testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29
testCompareUGEMaskNotInt ops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4
testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.6621 1.38
testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41
testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29
testCompareUGTMaskNotInt ops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38
testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38
testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41
testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29
testCompareULEMaskNotInt ops/s 1672350.925 1545.304304 2353231.755 914.231193 1.4
testCompareULEMaskNotLong ops/s 853349.4765 9804.906913 1177967.254 435.044367 1.38
testCompareULEMaskNotShort ops/s 3325757.891 1555.062257 4712873.187 1650.986905 1.41
testCompareULTMaskNotByte ops/s 7912218.621 2633.477744 10242095.98 21921.39902 1.29
testCompareULTMaskNotInt ops/s 1673994.849 2672.507666 2353449.22 946.105757 1.4
testCompareULTMaskNotLong ops/s 849032.5868 10406.06689 1177586.047 506.541456 1.38
testCompareULTMaskNotShort ops/s 3328062.026 1892.991844 4713247.216 1855.983724 1.41
```
With option `-XX:UseSVE=0`:
```
Benchmark Unit Before Score Error After Score Error Uplift
testCompareEQMaskNotByte ops/s 7895961.919 72712.90804 7746493.731 71481.92938 0.98
testCompareEQMaskNotDouble ops/s 789811.0455 384.493088 766473.7994 2216.581793 0.97
testCompareEQMaskNotFloat ops/s 1806305.818 638.010451 1819616.613 3295.38958 1
testCompareEQMaskNotInt ops/s 1815820.144 1225.336135 1849538.401 766.29902 1.01
testCompareEQMaskNotLong ops/s 807336.492 335.451807 792732.9483 277.954432 0.98
testCompareEQMaskNotShort ops/s 4818266.38 1927.862665 4668903.001 1922.782715 0.96
testCompareGEMaskNotByte ops/s 7818439.678 75374.97739 16498003.98 41440.49653 2.11
testCompareGEMaskNotInt ops/s 1815159.05 1090.912209 2372095.779 1664.397112 1.3
testCompareGEMaskNotLong ops/s 804324.5575 2301.686878 927919.8507 371.766719 1.15
testCompareGEMaskNotShort ops/s 4818966.563 2443.643652 5385561.038 29558.37423 1.11
testCompareGTMaskNotByte ops/s 7893406.157 82687.74264 16470663.2 22165.55812 2.08
testCompareGTMaskNotInt ops/s 1815316.812 915.894106 2370447.198 655.016338 1.3
testCompareGTMaskNotLong ops/s 807019.456 526.525482 928079.0541 330.582693 1.15
testCompareGTMaskNotShort ops/s 4820552.881 1684.247747 5355902.93 5893.2915 1.11
testCompareLEMaskNotByte ops/s 7816263.323 79560.0015 16473621.19 56688.99585 2.1
testCompareLEMaskNotInt ops/s 1814915.724 926.998625 2368790.306 932.594778 1.3
testCompareLEMaskNotLong ops/s 806483.9 935.718082 928110.9074 407.096695 1.15
testCompareLEMaskNotShort ops/s 4813660.241 6817.870509 5357107.852 10061.47975 1.11
testCompareLTMaskNotByte ops/s 7838948.962 69136.4504 16424405.96 24464.75469 2.09
testCompareLTMaskNotInt ops/s 1815056.833 1187.6453 2369892.187 1103.819634 1.3
testCompareLTMaskNotLong ops/s 806602.1804 287.923365 928346.4118 617.682824 1.15
testCompareLTMaskNotShort ops/s 4817940.643 2767.1509 5372537.84 15397.47169 1.11
testCompareNEMaskNotByte ops/s 9078493.798 4630.339307 16484348.42 18925.88346 1.81
testCompareNEMaskNotDouble ops/s 661769.6272 398.712981 926763.5839 1808.843788 1.4
testCompareNEMaskNotFloat ops/s 1570527.252 563.642144 2312425.678 1815.844846 1.47
testCompareNEMaskNotInt ops/s 1619146.58 626.793854 2369711.543 942.330478 1.46
testCompareNEMaskNotLong ops/s 680201.5381 2252.836482 927808.6147 414.917863 1.36
testCompareNEMaskNotShort ops/s 3763508.054 3622.560798 5367808.015 8591.466599 1.42
testCompareUGEMaskNotByte ops/s 7886373.129 75917.74675 16480928.93 27524.31005 2.08
testCompareUGEMaskNotInt ops/s 1815636.832 750.036241 2369683.015 901.609404 1.3
testCompareUGEMaskNotLong ops/s 806862.5826 287.819616 928001.4394 361.063837 1.15
testCompareUGEMaskNotShort ops/s 4820581.361 2098.537435 5375854.248 25619.40165 1.11
testCompareUGTMaskNotByte ops/s 7891591.465 96614.93542 16410405.93 15012.37096 2.07
testCompareUGTMaskNotInt ops/s 1814871.179 662.825588 2371325.903 1170.491164 1.3
testCompareUGTMaskNotLong ops/s 804013.7658 2240.534209 928062.2169 531.306897 1.15
testCompareUGTMaskNotShort ops/s 4818150.337 3051.717685 5381449.337 21212.34187 1.11
testCompareULEMaskNotByte ops/s 7831540.628 81306.67253 16495250.78 38682.19675 2.1
testCompareULEMaskNotInt ops/s 1814484.14 687.860656 2369265.075 940.609586 1.3
testCompareULEMaskNotLong ops/s 807780.5749 769.876816 927538.0732 1278.267724 1.14
testCompareULEMaskNotShort ops/s 4817437.42 5141.336541 5356183.359 7015.608124 1.11
testCompareULTMaskNotByte ops/s 7849078.225 56753.59764 16395975.27 34043.67295 2.08
testCompareULTMaskNotInt ops/s 1814328.226 2697.219111 2370700.47 1991.841988 1.3
testCompareULTMaskNotLong ops/s 807166.8197 253.061506 927926.2803 252.933462 1.14
testCompareULTMaskNotShort ops/s 4821098.216 1625.959044 5348980.243 4100.768121 1.1
```
Benchmarks on AMD EPYC 9124 16-Core Processor:
With option `-XX:UseAVX=3`:
```
Benchmark Unit Before Score Error After Score Error Uplift
testCompareEQMaskNotByte ops/s 16607323.35 1233692.631 18381557.66 1163201.522 1.1
testCompareEQMaskNotDouble ops/s 2114285.245 58782.2534 2959946.353 43016.0445 1.39
testCompareEQMaskNotFloat ops/s 4480874.437 89975.29074 6960151.436 64799.143 1.55
testCompareEQMaskNotInt ops/s 4370906.91 51784.80889 6856955.043 313858.5504 1.56
testCompareEQMaskNotLong ops/s 2080065.895 26762.06732 2939142.143 67179.05314 1.41
testCompareEQMaskNotShort ops/s 7968282.563 210437.2781 12701214.56 473152.6407 1.59
testCompareGEMaskNotByte ops/s 18419141.89 473408.9451 19880059.68 321638.0397 1.07
testCompareGEMaskNotInt ops/s 4419015.62 77352.98633 7037639.227 151066.0383 1.59
testCompareGEMaskNotLong ops/s 2147982.48 49227.42782 3000275.928 39298.75344 1.39
testCompareGEMaskNotShort ops/s 8469039.613 17833.19707 12288229.49 244317.8812 1.45
testCompareGTMaskNotByte ops/s 18728997.5 468328.8358 20544730.05 392264.6466 1.09
testCompareGTMaskNotInt ops/s 4510009.705 78812.57357 7364629.942 70970.78473 1.63
testCompareGTMaskNotLong ops/s 2124104.969 40917.89257 2953536.279 35199.19687 1.39
testCompareGTMaskNotShort ops/s 8690557.621 311534.1159 12344017.51 457931.8741 1.42
testCompareLEMaskNotByte ops/s 17758400.53 478383.4945 19209183.26 1143297.241 1.08
testCompareLEMaskNotInt ops/s 4363664.862 43443.18063 7054093.064 78141.11476 1.61
testCompareLEMaskNotLong ops/s 2068632.213 29844.78023 2954766.412 50667.22502 1.42
testCompareLEMaskNotShort ops/s 8637608.548 183538.5511 12719010.27 473568.8825 1.47
testCompareLTMaskNotByte ops/s 14406138.95 423105.0163 17292417.96 371386.9689 1.2
testCompareLTMaskNotInt ops/s 4546707.266 131977.3144 7040483.394 213590.4657 1.54
testCompareLTMaskNotLong ops/s 2123277.356 47243.21499 2848720.442 58896.97045 1.34
testCompareLTMaskNotShort ops/s 7570169.363 649873.6295 11945383.75 988276.5955 1.57
testCompareNEMaskNotByte ops/s 18274529.55 683396.7384 19081938.8 1118739.778 1.04
testCompareNEMaskNotDouble ops/s 2112533.61 43295.50012 2912115.441 78189.51083 1.37
testCompareNEMaskNotFloat ops/s 4628683.814 93817.07362 6967208.729 145135.8544 1.5
testCompareNEMaskNotInt ops/s 4470900.214 75974.50842 7286913.662 116328.5277 1.62
testCompareNEMaskNotLong ops/s 2134091.061 46377.94061 2934667.477 81675.46021 1.37
testCompareNEMaskNotShort ops/s 8790384.287 396161.8599 1307685.35 286272.1155 1.48
testCompareUGEMaskNotByte ops/s 18009150.9 660803.8886 17551258.33 1667014.843 0.97
testCompareUGEMaskNotInt ops/s 4442928.74 83190.81019 6854088.277 329008.8901 1.54
testCompareUGEMaskNotLong ops/s 2088357.736 71696.24791 2973202.26 63278.78974 1.42
testCompareUGEMaskNotShort ops/s 8348624.02 116562.7876 12832250.78 546869.3006 1.53
testCompareUGTMaskNotByte ops/s 17871101.25 800199.6321 19902619.81 214003.3262 1.11
testCompareUGTMaskNotInt ops/s 4088304.421 137797.9723 7135454.33 124553.651 1.74
testCompareUGTMaskNotLong ops/s 2070610.42 19881.82182 2991536.365 36260.60767 1.44
testCompareUGTMaskNotShort ops/s 8637099.341 155822.1608 12756579.77 186068.199 1.47
testCompareULEMaskNotByte ops/s 17940901.36 1258029.364 18932484.94 694554.6305 1.05
testCompareULEMaskNotInt ops/s 4369177.511 74982.31936 6392773.082 550171.2266 1.46
testCompareULEMaskNotLong ops/s 2135905.761 43693.63178 2877579.631 41651.56289 1.34
testCompareULEMaskNotShort ops/s 8607710.544 132655.1676 12446370.04 441718.3035 1.44
testCompareULTMaskNotByte ops/s 17409912.23 1033204.537 20607479.99 362000.5056 1.18
testCompareULTMaskNotInt ops/s 4386455.9 119192.1635 6920123.264 186158.2845 1.57
testCompareULTMaskNotLong ops/s 2064995.149 38622.2734 2988343.589 39037.90006 1.44
testCompareULTMaskNotShort ops/s 8642182.752 230919.2442 13029582.09 437101.4923 1.5
```
The small amount of performance degradation is due to test fluctuations.
|
👋 Welcome back erifan! A progress list of the required criteria for merging this PR into |
|
@erifan This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 101 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@eme64, @XiaohongGong, @jatin-bhateja) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Webrevs
|
test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java
Outdated
Show resolved
Hide resolved
erifan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jatin-bhateja Thanks for your review!
test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java
Outdated
Show resolved
Hide resolved
1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments.
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a drive-by comment for now, I may review this later more fully.
I would also prefer if you added the IR restrictions rather than the JTREG requires.
The benefit is that we can still run the tests on all platforms, at least for result verification.Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug.
Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either.
test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java
Outdated
Show resolved
Hide resolved
Thanks! The problem is that when a new platform is added, people may not even know there is a test. |
@erifan That is true. But we have that problem either way. If you use |
This test will run on new platforms when we use @requires. I explained the meaning of the @requires in the previous comment, it only excludes one case: when -XX:UseAVX=0 is specified on x86 platforms. |
I see. You should probably add a comment there, to say that you are only excluding |
@requires is a special comment itself. I feel like it's a bit weird to add a comment to a comment, and I don't think the @requires is hard to understand. If we want to verify the correctness of AVX=0, we have to use ApplyIf. This is back to the beginning of the question, should we use @requires or ApplyIf? Personally I tend to use the former. By the way, I have tested the correctness of AVX=0 locally. |
|
Yes, this discussion is down to
In my understanding, Actually, I filed this RFE a while ago: https://bugs.openjdk.org/browse/JDK-8310891 |
I see, I'll update the code. Thanks~ |
|
@eme64 @jatin-bhateja I have updated the test, thanks for your suggestion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erifan thanks for updating the tests!
Now I had a quick look at the VM code.
My biggest observation is this:
Wrapping VectorNode::Ideal somewhere in the middle of your new optimization is going to make future optimizations here much harder.
How would they check their conditions next to yours? That would be quite a mess.
I suggest you do this:
XorVNode::Idealdoes- checks
in1 == in2case - calls a method called
XorVNode::Ideal_XorV_VectorMaskCmp. Check if it succeeded, i.e. returnsnullptr. - ... future optimizations could go here ...
- Finally, i.e. none of the optimizations above worked: call
VectorNode::Ideal
- checks
Then you pack all your new logic here into XorVNode::Ideal_XorV_VectorMaskCmp. You can also find a better name, it is just what I came up with just now.
This gives us a much more modular design, and it is easier to add another new optimization to XorVNode::Ideal. It is easy to change the precedence of the optimizations by just changing the order, etc.
Examples of this "modular" design:
CMoveNode::Ideal-> callsTypeNode::IdealandIdeal_minmax.StoreBNode::Ideal-> callsStoreNode::Ideal_masked_inputandStoreNode::Ideal_sign_extended_input
These are really nice, because you can quickly see what optimizations we already have, and in which order they are checked.
Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular.
|
/keepalive |
|
@erifan The pull request is being re-evaluated and the inactivity timeout has been reset. |
|
Hi, can anyone review this PR? |
|
Hi @eme64 @theRealAph @XiaohongGong @fg1417 @shqking , could you help take a look at this PR, thanks |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better, thanks for the updates!
I have another small list of suggestions :)
| @Test | ||
| @IR(counts = { IRNode.XOR_V_MASK, "= 0", | ||
| IRNode.XOR_V, "= 0", | ||
| IRNode.VECTOR_MASK_CMP, "= 2" }, | ||
| applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" }) | ||
| public static void testCompareEQMaskNotFloat() { | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fb, (m) -> { return m.not(); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fb); | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fb, (m) -> { return F_SPECIES.maskAll(true).xor(m); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fb); | ||
| } | ||
|
|
||
| @Test | ||
| @IR(counts = { IRNode.XOR_V_MASK, "= 0", | ||
| IRNode.XOR_V, "= 0", | ||
| IRNode.VECTOR_MASK_CMP, "= 2" }, | ||
| applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" }) | ||
| public static void testCompareNEMaskNotFloat() { | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fb, (m) -> { return m.not(); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fb); | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fb, (m) -> { return F_SPECIES.maskAll(true).xor(m); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fb); | ||
| } | ||
|
|
||
| @Test | ||
| @IR(counts = { IRNode.XOR_V_MASK, "= 0", | ||
| IRNode.XOR_V, "= 0", | ||
| IRNode.VECTOR_MASK_CMP, "= 2" }, | ||
| applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" }) | ||
| public static void testCompareEQMaskNotFloatNaN() { | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fnan, (m) -> { return m.not(); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fnan); | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fnan, (m) -> { return F_SPECIES.maskAll(true).xor(m); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fnan); | ||
| } | ||
|
|
||
| @Test | ||
| @IR(counts = { IRNode.XOR_V_MASK, "= 0", | ||
| IRNode.XOR_V, "= 0", | ||
| IRNode.VECTOR_MASK_CMP, "= 2" }, | ||
| applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" }) | ||
| public static void testCompareNEMaskNotFloatNaN() { | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fnan, (m) -> { return m.not(); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fnan); | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fnan, (m) -> { return F_SPECIES.maskAll(true).xor(m); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fnan); | ||
| } | ||
|
|
||
| @Test | ||
| @IR(counts = { IRNode.XOR_V_MASK, "= 0", | ||
| IRNode.XOR_V, "= 0", | ||
| IRNode.VECTOR_MASK_CMP, "= 2" }, | ||
| applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" }) | ||
| public static void testCompareEQMaskNotFloatPositiveInfinity() { | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fpinf, (m) -> { return m.not(); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fpinf); | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fpinf, (m) -> { return F_SPECIES.maskAll(true).xor(m); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fpinf); | ||
| } | ||
|
|
||
| @Test | ||
| @IR(counts = { IRNode.XOR_V_MASK, "= 0", | ||
| IRNode.XOR_V, "= 0", | ||
| IRNode.VECTOR_MASK_CMP, "= 2" }, | ||
| applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" }) | ||
| public static void testCompareNEMaskNotFloatPositiveInfinity() { | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fpinf, (m) -> { return m.not(); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fpinf); | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fpinf, (m) -> { return F_SPECIES.maskAll(true).xor(m); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fpinf); | ||
| } | ||
|
|
||
| @Test | ||
| @IR(counts = { IRNode.XOR_V_MASK, "= 0", | ||
| IRNode.XOR_V, "= 0", | ||
| IRNode.VECTOR_MASK_CMP, "= 2" }, | ||
| applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" }) | ||
| public static void testCompareEQMaskNotFloatNegativeInfinity() { | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fninf, (m) -> { return m.not(); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fninf); | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fninf, (m) -> { return F_SPECIES.maskAll(true).xor(m); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fninf); | ||
| } | ||
|
|
||
| @Test | ||
| @IR(counts = { IRNode.XOR_V_MASK, "= 0", | ||
| IRNode.XOR_V, "= 0", | ||
| IRNode.VECTOR_MASK_CMP, "= 2" }, | ||
| applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" }) | ||
| public static void testCompareNEMaskNotFloatNegativeInfinity() { | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fninf, (m) -> { return m.not(); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fninf); | ||
| testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fninf, (m) -> { return F_SPECIES.maskAll(true).xor(m); }); | ||
| verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fninf); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have test cases for the cases other than EQ and NE? After all, we don't that someone accidentally messes with the logic you implemented later and we don't notice the bug ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For float and double, only EQ and NE are supported. So the positive test only includes these two OPs. And we have one negative test for other unsupported OPs, see testCompareMaskNotFloatNegative.
test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java
Outdated
Show resolved
Hide resolved
test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java
Outdated
Show resolved
Hide resolved
erifan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eme64 Thank you for your patience in reviewing this PR. I'm doing some internal testing and expect to push a new commit next week. I'll be on vacation for the next two days. Thank you!
test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java
Outdated
Show resolved
Hide resolved
test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java
Outdated
Show resolved
Hide resolved
|
@erifan Sounds good. No rush, it takes as long as it takes. I'll soon be on vacation too and may not respond until mid of October. |
|
Hi @eme64 I have dealt with all of your suggestions except one that I think it has already been covered. Could you please have a look at this PR when you have a chance? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your benchmark and code changes look good to me. Thanks for addressing my comments.
|
Thanks @jatin-bhateja . And the updated benchmarks test results are as follow, no much changes. On Nvidia Grace machine with 128-bit SVE2: With option On AMD EPYC 9124 16-Core Processor: |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erifan Nice work on the benchmark refactor! And thanks for the other updates.
I'll run some testing now, should take about 24h.
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erifan Thanks for the work! All tests pass on my side, patch looks good to me too :)
|
Thanks all for your help, I'll integrate the PR. |
|
/integrate |
|
/sponsor |
|
Going to push as commit 45cc515.
Your commit was automatically rebased without conflicts. |
|
@XiaohongGong @erifan Pushed as commit 45cc515. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This patch optimizes the following patterns:
For integer types:
cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.
For float and double types:
cond can be eq or ne.
Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option
-XX:UseSVE=2:With option
-XX:UseSVE=0:Benchmarks on AMD EPYC 9124 16-Core Processor:
With option
-XX:UseAVX=3:The small amount of performance degradation is due to test fluctuations.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674$ git checkout pull/24674Update a local copy of the PR:
$ git checkout pull/24674$ git pull https://git.openjdk.org/jdk.git pull/24674/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24674View PR using the GUI difftool:
$ git pr show -t 24674Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24674.diff
Using Webrev
Link to Webrev Comment