8354242: VectorAPI: combine vector not operation with compare #24674

erifan · 2025-04-16T06:39:33Z

This patch optimizes the following patterns:
For integer types:

(XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
    => (VectorMaskCmp src1 src2 ncond)
(XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
    => (VectorMaskCmp src1 src2 ncond)

cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond.

For float and double types:

(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
    => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
(XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
    => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))

cond can be eq or ne.

Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option -XX:UseSVE=2:

Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
testCompareEQMaskNotByte	ops/s	7912127.225	2677.289518	10266136.26	8955.008548	1.29
testCompareEQMaskNotDouble	ops/s	884737.6799	446.963779	1179760.772	448.031844	1.33
testCompareEQMaskNotFloat	ops/s	1765045.787	682.332214	2359520.803	896.305743	1.33
testCompareEQMaskNotInt		ops/s	1787221.411	977.743935	2353952.519	960.069976	1.31
testCompareEQMaskNotLong	ops/s	895297.1974	673.44808	1178449.02	323.804205	1.31
testCompareEQMaskNotShort	ops/s	3339987.002	3415.2226	4712761.965	2110.862053	1.41
testCompareGEMaskNotByte	ops/s	7907615.16	4094.243652	10251646.9	9486.699831	1.29
testCompareGEMaskNotInt		ops/s	1683738.958	4233.813092	2352855.205	1251.952546	1.39
testCompareGEMaskNotLong	ops/s	854496.1561	8594.598885	1177811.493	521.1229	1.37
testCompareGEMaskNotShort	ops/s	3341860.309	1578.975338	4714008.434	1681.10365	1.41
testCompareGTMaskNotByte	ops/s	7910823.674	2993.367032	10245063.58	9774.75138	1.29
testCompareGTMaskNotInt		ops/s	1673393.928	3153.099431	2353654.521	1190.848583	1.4
testCompareGTMaskNotLong	ops/s	849405.9159	2432.858159	1177952.041	359.96413	1.38
testCompareGTMaskNotShort	ops/s	3339509.141	3339.976585	4711442.496	2673.364893	1.41
testCompareLEMaskNotByte	ops/s	7911340.004	3114.69191	10231626.5	27134.20035	1.29
testCompareLEMaskNotInt		ops/s	1675812.113	1340.969885	2353255.341	1452.4522	1.4
testCompareLEMaskNotLong	ops/s	848862.8036	6564.841731	1177763.623	539.290106	1.38
testCompareLEMaskNotShort	ops/s	3324951.54	2380.29473	4712116.251	1544.559684	1.41
testCompareLTMaskNotByte	ops/s	7910390.844	2630.861436	10239567.69	6487.441672	1.29
testCompareLTMaskNotInt		ops/s	1672180.09	995.238142	2353757.863	853.774734	1.4
testCompareLTMaskNotLong	ops/s	856502.2695	12276.82851	1177671.815	496.723302	1.37
testCompareLTMaskNotShort	ops/s	3325798.025	2412.702501	4711554.181	1779.302112	1.41
testCompareNEMaskNotByte	ops/s	7910002.518	2771.82477	10245315.33	16321.93935	1.29
testCompareNEMaskNotDouble	ops/s	863754.6022	523.140788	1179133.982	476.572178	1.36
testCompareNEMaskNotFloat	ops/s	1723321.883	2598.484803	2358492.186	877.1401	1.36
testCompareNEMaskNotInt		ops/s	1670288.841	751.774826	2354158.125	835.720163	1.4
testCompareNEMaskNotLong	ops/s	836327.6835	410.525466	1178178.825	308.757932	1.4
testCompareNEMaskNotShort	ops/s	3327815.841	1511.978763	4711379.136	2336.505531	1.41
testCompareUGEMaskNotByte	ops/s	7906699.024	3200.936474	10253843.74	15067.59401	1.29
testCompareUGEMaskNotInt	ops/s	1674003.923	3287.191727	2353340.666	951.381021	1.4
testCompareUGEMaskNotLong	ops/s	852424.5562	8920.408939	1177943.609	389.6621	1.38
testCompareUGEMaskNotShort	ops/s	3327255.858	1584.885143	4711622.355	1247.215277	1.41
testCompareUGTMaskNotByte	ops/s	7909249.189	4435.283667	10245541.34	10993.34739	1.29
testCompareUGTMaskNotInt	ops/s	1693713.433	20650.00213	2353153.787	1055.343846	1.38
testCompareUGTMaskNotLong	ops/s	851022.3395	7079.065268	1177910.677	538.604598	1.38
testCompareUGTMaskNotShort	ops/s	3327236.988	1616.886789	4711209.865	3098.494145	1.41
testCompareULEMaskNotByte	ops/s	7909350.825	3251.262342	10261449.03	7273.831341	1.29
testCompareULEMaskNotInt	ops/s	1672350.925	1545.304304	2353231.755	914.231193	1.4
testCompareULEMaskNotLong	ops/s	853349.4765	9804.906913	1177967.254	435.044367	1.38
testCompareULEMaskNotShort	ops/s	3325757.891	1555.062257	4712873.187	1650.986905	1.41
testCompareULTMaskNotByte	ops/s	7912218.621	2633.477744	10242095.98	21921.39902	1.29
testCompareULTMaskNotInt	ops/s	1673994.849	2672.507666	2353449.22	946.105757	1.4
testCompareULTMaskNotLong	ops/s	849032.5868	10406.06689	1177586.047	506.541456	1.38
testCompareULTMaskNotShort	ops/s	3328062.026	1892.991844	4713247.216	1855.983724	1.41

With option -XX:UseSVE=0:

Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
testCompareEQMaskNotByte	ops/s	7895961.919	72712.90804	7746493.731	71481.92938	0.98
testCompareEQMaskNotDouble	ops/s	789811.0455	384.493088	766473.7994	2216.581793	0.97
testCompareEQMaskNotFloat	ops/s	1806305.818	638.010451	1819616.613	3295.38958	1
testCompareEQMaskNotInt		ops/s	1815820.144	1225.336135	1849538.401	766.29902	1.01
testCompareEQMaskNotLong	ops/s	807336.492	335.451807	792732.9483	277.954432	0.98
testCompareEQMaskNotShort	ops/s	4818266.38	1927.862665	4668903.001	1922.782715	0.96
testCompareGEMaskNotByte	ops/s	7818439.678	75374.97739	16498003.98	41440.49653	2.11
testCompareGEMaskNotInt		ops/s	1815159.05	1090.912209	2372095.779	1664.397112	1.3
testCompareGEMaskNotLong	ops/s	804324.5575	2301.686878	927919.8507	371.766719	1.15
testCompareGEMaskNotShort	ops/s	4818966.563	2443.643652	5385561.038	29558.37423	1.11
testCompareGTMaskNotByte	ops/s	7893406.157	82687.74264	16470663.2	22165.55812	2.08
testCompareGTMaskNotInt		ops/s	1815316.812	915.894106	2370447.198	655.016338	1.3
testCompareGTMaskNotLong	ops/s	807019.456	526.525482	928079.0541	330.582693	1.15
testCompareGTMaskNotShort	ops/s	4820552.881	1684.247747	5355902.93	5893.2915	1.11
testCompareLEMaskNotByte	ops/s	7816263.323	79560.0015	16473621.19	56688.99585	2.1
testCompareLEMaskNotInt		ops/s	1814915.724	926.998625	2368790.306	932.594778	1.3
testCompareLEMaskNotLong	ops/s	806483.9	935.718082	928110.9074	407.096695	1.15
testCompareLEMaskNotShort	ops/s	4813660.241	6817.870509	5357107.852	10061.47975	1.11
testCompareLTMaskNotByte	ops/s	7838948.962	69136.4504	16424405.96	24464.75469	2.09
testCompareLTMaskNotInt		ops/s	1815056.833	1187.6453	2369892.187	1103.819634	1.3
testCompareLTMaskNotLong	ops/s	806602.1804	287.923365	928346.4118	617.682824	1.15
testCompareLTMaskNotShort	ops/s	4817940.643	2767.1509	5372537.84	15397.47169	1.11
testCompareNEMaskNotByte	ops/s	9078493.798	4630.339307	16484348.42	18925.88346	1.81
testCompareNEMaskNotDouble	ops/s	661769.6272	398.712981	926763.5839	1808.843788	1.4
testCompareNEMaskNotFloat	ops/s	1570527.252	563.642144	2312425.678	1815.844846	1.47
testCompareNEMaskNotInt		ops/s	1619146.58	626.793854	2369711.543	942.330478	1.46
testCompareNEMaskNotLong	ops/s	680201.5381	2252.836482	927808.6147	414.917863	1.36
testCompareNEMaskNotShort	ops/s	3763508.054	3622.560798	5367808.015	8591.466599	1.42
testCompareUGEMaskNotByte	ops/s	7886373.129	75917.74675	16480928.93	27524.31005	2.08
testCompareUGEMaskNotInt	ops/s	1815636.832	750.036241	2369683.015	901.609404	1.3
testCompareUGEMaskNotLong	ops/s	806862.5826	287.819616	928001.4394	361.063837	1.15
testCompareUGEMaskNotShort	ops/s	4820581.361	2098.537435	5375854.248	25619.40165	1.11
testCompareUGTMaskNotByte	ops/s	7891591.465	96614.93542	16410405.93	15012.37096	2.07
testCompareUGTMaskNotInt	ops/s	1814871.179	662.825588	2371325.903	1170.491164	1.3
testCompareUGTMaskNotLong	ops/s	804013.7658	2240.534209	928062.2169	531.306897	1.15
testCompareUGTMaskNotShort	ops/s	4818150.337	3051.717685	5381449.337	21212.34187	1.11
testCompareULEMaskNotByte	ops/s	7831540.628	81306.67253	16495250.78	38682.19675	2.1
testCompareULEMaskNotInt	ops/s	1814484.14	687.860656	2369265.075	940.609586	1.3
testCompareULEMaskNotLong	ops/s	807780.5749	769.876816	927538.0732	1278.267724	1.14
testCompareULEMaskNotShort	ops/s	4817437.42	5141.336541	5356183.359	7015.608124	1.11
testCompareULTMaskNotByte	ops/s	7849078.225	56753.59764	16395975.27	34043.67295	2.08
testCompareULTMaskNotInt	ops/s	1814328.226	2697.219111	2370700.47	1991.841988	1.3
testCompareULTMaskNotLong	ops/s	807166.8197	253.061506	927926.2803	252.933462	1.14
testCompareULTMaskNotShort	ops/s	4821098.216	1625.959044	5348980.243	4100.768121	1.1

Benchmarks on AMD EPYC 9124 16-Core Processor:
With option -XX:UseAVX=3:

Benchmark			Unit	Before		Score Error	After		Score Error	Uplift
testCompareEQMaskNotByte	ops/s	16607323.35	1233692.631	18381557.66	1163201.522	1.1
testCompareEQMaskNotDouble	ops/s	2114285.245	58782.2534	2959946.353	43016.0445	1.39
testCompareEQMaskNotFloat	ops/s	4480874.437	89975.29074	6960151.436	64799.143	1.55
testCompareEQMaskNotInt		ops/s	4370906.91	51784.80889	6856955.043	313858.5504	1.56
testCompareEQMaskNotLong	ops/s	2080065.895	26762.06732	2939142.143	67179.05314	1.41
testCompareEQMaskNotShort	ops/s	7968282.563	210437.2781	12701214.56	473152.6407	1.59
testCompareGEMaskNotByte	ops/s	18419141.89	473408.9451	19880059.68	321638.0397	1.07
testCompareGEMaskNotInt		ops/s	4419015.62	77352.98633	7037639.227	151066.0383	1.59
testCompareGEMaskNotLong	ops/s	2147982.48	49227.42782	3000275.928	39298.75344	1.39
testCompareGEMaskNotShort	ops/s	8469039.613	17833.19707	12288229.49	244317.8812	1.45
testCompareGTMaskNotByte	ops/s	18728997.5	468328.8358	20544730.05	392264.6466	1.09
testCompareGTMaskNotInt		ops/s	4510009.705	78812.57357	7364629.942	70970.78473	1.63
testCompareGTMaskNotLong	ops/s	2124104.969	40917.89257	2953536.279	35199.19687	1.39
testCompareGTMaskNotShort	ops/s	8690557.621	311534.1159	12344017.51	457931.8741	1.42
testCompareLEMaskNotByte	ops/s	17758400.53	478383.4945	19209183.26	1143297.241	1.08
testCompareLEMaskNotInt		ops/s	4363664.862	43443.18063	7054093.064	78141.11476	1.61
testCompareLEMaskNotLong	ops/s	2068632.213	29844.78023	2954766.412	50667.22502	1.42
testCompareLEMaskNotShort	ops/s	8637608.548	183538.5511	12719010.27	473568.8825	1.47
testCompareLTMaskNotByte	ops/s	14406138.95	423105.0163	17292417.96	371386.9689	1.2
testCompareLTMaskNotInt		ops/s	4546707.266	131977.3144	7040483.394	213590.4657	1.54
testCompareLTMaskNotLong	ops/s	2123277.356	47243.21499	2848720.442	58896.97045	1.34
testCompareLTMaskNotShort	ops/s	7570169.363	649873.6295	11945383.75	988276.5955	1.57
testCompareNEMaskNotByte	ops/s	18274529.55	683396.7384	19081938.8	1118739.778	1.04
testCompareNEMaskNotDouble	ops/s	2112533.61	43295.50012	2912115.441	78189.51083	1.37
testCompareNEMaskNotFloat	ops/s	4628683.814	93817.07362	6967208.729	145135.8544	1.5
testCompareNEMaskNotInt		ops/s	4470900.214	75974.50842	7286913.662	116328.5277	1.62
testCompareNEMaskNotLong	ops/s	2134091.061	46377.94061	2934667.477	81675.46021	1.37
testCompareNEMaskNotShort	ops/s	8790384.287	396161.8599	13076858.35	286272.1155	1.48
testCompareUGEMaskNotByte	ops/s	18009150.9	660803.8886	17551258.33	1667014.843	0.97
testCompareUGEMaskNotInt	ops/s	4442928.74	83190.81019	6854088.277	329008.8901	1.54
testCompareUGEMaskNotLong	ops/s	2088357.736	71696.24791	2973202.26	63278.78974	1.42
testCompareUGEMaskNotShort	ops/s	8348624.02	116562.7876	12832250.78	546869.3006	1.53
testCompareUGTMaskNotByte	ops/s	17871101.25	800199.6321	19902619.81	214003.3262	1.11
testCompareUGTMaskNotInt	ops/s	4088304.421	137797.9723	7135454.33	124553.651	1.74
testCompareUGTMaskNotLong	ops/s	2070610.42	19881.82182	2991536.365	36260.60767	1.44
testCompareUGTMaskNotShort	ops/s	8637099.341	155822.1608	12756579.77	186068.199	1.47
testCompareULEMaskNotByte	ops/s	17940901.36	1258029.364	18932484.94	694554.6305	1.05
testCompareULEMaskNotInt	ops/s	4369177.511	74982.31936	6392773.082	550171.2266	1.46
testCompareULEMaskNotLong	ops/s	2135905.761	43693.63178	2877579.631	41651.56289	1.34
testCompareULEMaskNotShort	ops/s	8607710.544	132655.1676	12446370.04	441718.3035	1.44
testCompareULTMaskNotByte	ops/s	17409912.23	1033204.537	20607479.99	362000.5056	1.18
testCompareULTMaskNotInt	ops/s	4386455.9	119192.1635	6920123.264	186158.2845	1.57
testCompareULTMaskNotLong	ops/s	2064995.149	38622.2734	2988343.589	39037.90006	1.44
testCompareULTMaskNotShort	ops/s	8642182.752	230919.2442	13029582.09	437101.4923	1.5

The small amount of performance degradation is due to test fluctuations.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8354242: VectorAPI: combine vector not operation with compare (Enhancement - P4)

Reviewers

Emanuel Peter (@eme64 - Reviewer)
Xiaohong Gong (@XiaohongGong - Committer) Review applies to 5ebdc572
Jatin Bhateja (@jatin-bhateja - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674
$ git checkout pull/24674

Update a local copy of the PR:
$ git checkout pull/24674
$ git pull https://git.openjdk.org/jdk.git pull/24674/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24674

View PR using the GUI difftool:
$ git pr show -t 24674

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24674.diff

Using Webrev

Link to Webrev Comment

This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 1024506.58 9774.75138 1.29 testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 testCompareLTMaskNotLong ops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37 testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41 testCompareNEMaskNotByte ops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29 testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36 testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.1401 1.36 testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4 testCompareNEMaskNotLong ops/s 836327.6835 410.525466 1178178.825 308.757932 1.4 testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41 testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29 testCompareUGEMaskNotInt ops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4 testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.6621 1.38 testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41 testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29 testCompareUGTMaskNotInt ops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38 testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38 testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41 testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29 testCompareULEMaskNotInt ops/s 1672350.925 1545.304304 2353231.755 914.231193 1.4 testCompareULEMaskNotLong ops/s 853349.4765 9804.906913 1177967.254 435.044367 1.38 testCompareULEMaskNotShort ops/s 3325757.891 1555.062257 4712873.187 1650.986905 1.41 testCompareULTMaskNotByte ops/s 7912218.621 2633.477744 10242095.98 21921.39902 1.29 testCompareULTMaskNotInt ops/s 1673994.849 2672.507666 2353449.22 946.105757 1.4 testCompareULTMaskNotLong ops/s 849032.5868 10406.06689 1177586.047 506.541456 1.38 testCompareULTMaskNotShort ops/s 3328062.026 1892.991844 4713247.216 1855.983724 1.41 ``` With option `-XX:UseSVE=0`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7895961.919 72712.90804 7746493.731 71481.92938 0.98 testCompareEQMaskNotDouble ops/s 789811.0455 384.493088 766473.7994 2216.581793 0.97 testCompareEQMaskNotFloat ops/s 1806305.818 638.010451 1819616.613 3295.38958 1 testCompareEQMaskNotInt ops/s 1815820.144 1225.336135 1849538.401 766.29902 1.01 testCompareEQMaskNotLong ops/s 807336.492 335.451807 792732.9483 277.954432 0.98 testCompareEQMaskNotShort ops/s 4818266.38 1927.862665 4668903.001 1922.782715 0.96 testCompareGEMaskNotByte ops/s 7818439.678 75374.97739 16498003.98 41440.49653 2.11 testCompareGEMaskNotInt ops/s 1815159.05 1090.912209 2372095.779 1664.397112 1.3 testCompareGEMaskNotLong ops/s 804324.5575 2301.686878 927919.8507 371.766719 1.15 testCompareGEMaskNotShort ops/s 4818966.563 2443.643652 5385561.038 29558.37423 1.11 testCompareGTMaskNotByte ops/s 7893406.157 82687.74264 16470663.2 22165.55812 2.08 testCompareGTMaskNotInt ops/s 1815316.812 915.894106 2370447.198 655.016338 1.3 testCompareGTMaskNotLong ops/s 807019.456 526.525482 928079.0541 330.582693 1.15 testCompareGTMaskNotShort ops/s 4820552.881 1684.247747 5355902.93 5893.2915 1.11 testCompareLEMaskNotByte ops/s 7816263.323 79560.0015 16473621.19 56688.99585 2.1 testCompareLEMaskNotInt ops/s 1814915.724 926.998625 2368790.306 932.594778 1.3 testCompareLEMaskNotLong ops/s 806483.9 935.718082 928110.9074 407.096695 1.15 testCompareLEMaskNotShort ops/s 4813660.241 6817.870509 5357107.852 10061.47975 1.11 testCompareLTMaskNotByte ops/s 7838948.962 69136.4504 16424405.96 24464.75469 2.09 testCompareLTMaskNotInt ops/s 1815056.833 1187.6453 2369892.187 1103.819634 1.3 testCompareLTMaskNotLong ops/s 806602.1804 287.923365 928346.4118 617.682824 1.15 testCompareLTMaskNotShort ops/s 4817940.643 2767.1509 5372537.84 15397.47169 1.11 testCompareNEMaskNotByte ops/s 9078493.798 4630.339307 16484348.42 18925.88346 1.81 testCompareNEMaskNotDouble ops/s 661769.6272 398.712981 926763.5839 1808.843788 1.4 testCompareNEMaskNotFloat ops/s 1570527.252 563.642144 2312425.678 1815.844846 1.47 testCompareNEMaskNotInt ops/s 1619146.58 626.793854 2369711.543 942.330478 1.46 testCompareNEMaskNotLong ops/s 680201.5381 2252.836482 927808.6147 414.917863 1.36 testCompareNEMaskNotShort ops/s 3763508.054 3622.560798 5367808.015 8591.466599 1.42 testCompareUGEMaskNotByte ops/s 7886373.129 75917.74675 16480928.93 27524.31005 2.08 testCompareUGEMaskNotInt ops/s 1815636.832 750.036241 2369683.015 901.609404 1.3 testCompareUGEMaskNotLong ops/s 806862.5826 287.819616 928001.4394 361.063837 1.15 testCompareUGEMaskNotShort ops/s 4820581.361 2098.537435 5375854.248 25619.40165 1.11 testCompareUGTMaskNotByte ops/s 7891591.465 96614.93542 16410405.93 15012.37096 2.07 testCompareUGTMaskNotInt ops/s 1814871.179 662.825588 2371325.903 1170.491164 1.3 testCompareUGTMaskNotLong ops/s 804013.7658 2240.534209 928062.2169 531.306897 1.15 testCompareUGTMaskNotShort ops/s 4818150.337 3051.717685 5381449.337 21212.34187 1.11 testCompareULEMaskNotByte ops/s 7831540.628 81306.67253 16495250.78 38682.19675 2.1 testCompareULEMaskNotInt ops/s 1814484.14 687.860656 2369265.075 940.609586 1.3 testCompareULEMaskNotLong ops/s 807780.5749 769.876816 927538.0732 1278.267724 1.14 testCompareULEMaskNotShort ops/s 4817437.42 5141.336541 5356183.359 7015.608124 1.11 testCompareULTMaskNotByte ops/s 7849078.225 56753.59764 16395975.27 34043.67295 2.08 testCompareULTMaskNotInt ops/s 1814328.226 2697.219111 2370700.47 1991.841988 1.3 testCompareULTMaskNotLong ops/s 807166.8197 253.061506 927926.2803 252.933462 1.14 testCompareULTMaskNotShort ops/s 4821098.216 1625.959044 5348980.243 4100.768121 1.1 ``` Benchmarks on AMD EPYC 9124 16-Core Processor: With option `-XX:UseAVX=3`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 16607323.35 1233692.631 18381557.66 1163201.522 1.1 testCompareEQMaskNotDouble ops/s 2114285.245 58782.2534 2959946.353 43016.0445 1.39 testCompareEQMaskNotFloat ops/s 4480874.437 89975.29074 6960151.436 64799.143 1.55 testCompareEQMaskNotInt ops/s 4370906.91 51784.80889 6856955.043 313858.5504 1.56 testCompareEQMaskNotLong ops/s 2080065.895 26762.06732 2939142.143 67179.05314 1.41 testCompareEQMaskNotShort ops/s 7968282.563 210437.2781 12701214.56 473152.6407 1.59 testCompareGEMaskNotByte ops/s 18419141.89 473408.9451 19880059.68 321638.0397 1.07 testCompareGEMaskNotInt ops/s 4419015.62 77352.98633 7037639.227 151066.0383 1.59 testCompareGEMaskNotLong ops/s 2147982.48 49227.42782 3000275.928 39298.75344 1.39 testCompareGEMaskNotShort ops/s 8469039.613 17833.19707 12288229.49 244317.8812 1.45 testCompareGTMaskNotByte ops/s 18728997.5 468328.8358 20544730.05 392264.6466 1.09 testCompareGTMaskNotInt ops/s 4510009.705 78812.57357 7364629.942 70970.78473 1.63 testCompareGTMaskNotLong ops/s 2124104.969 40917.89257 2953536.279 35199.19687 1.39 testCompareGTMaskNotShort ops/s 8690557.621 311534.1159 12344017.51 457931.8741 1.42 testCompareLEMaskNotByte ops/s 17758400.53 478383.4945 19209183.26 1143297.241 1.08 testCompareLEMaskNotInt ops/s 4363664.862 43443.18063 7054093.064 78141.11476 1.61 testCompareLEMaskNotLong ops/s 2068632.213 29844.78023 2954766.412 50667.22502 1.42 testCompareLEMaskNotShort ops/s 8637608.548 183538.5511 12719010.27 473568.8825 1.47 testCompareLTMaskNotByte ops/s 14406138.95 423105.0163 17292417.96 371386.9689 1.2 testCompareLTMaskNotInt ops/s 4546707.266 131977.3144 7040483.394 213590.4657 1.54 testCompareLTMaskNotLong ops/s 2123277.356 47243.21499 2848720.442 58896.97045 1.34 testCompareLTMaskNotShort ops/s 7570169.363 649873.6295 11945383.75 988276.5955 1.57 testCompareNEMaskNotByte ops/s 18274529.55 683396.7384 19081938.8 1118739.778 1.04 testCompareNEMaskNotDouble ops/s 2112533.61 43295.50012 2912115.441 78189.51083 1.37 testCompareNEMaskNotFloat ops/s 4628683.814 93817.07362 6967208.729 145135.8544 1.5 testCompareNEMaskNotInt ops/s 4470900.214 75974.50842 7286913.662 116328.5277 1.62 testCompareNEMaskNotLong ops/s 2134091.061 46377.94061 2934667.477 81675.46021 1.37 testCompareNEMaskNotShort ops/s 8790384.287 396161.8599 1307685.35 286272.1155 1.48 testCompareUGEMaskNotByte ops/s 18009150.9 660803.8886 17551258.33 1667014.843 0.97 testCompareUGEMaskNotInt ops/s 4442928.74 83190.81019 6854088.277 329008.8901 1.54 testCompareUGEMaskNotLong ops/s 2088357.736 71696.24791 2973202.26 63278.78974 1.42 testCompareUGEMaskNotShort ops/s 8348624.02 116562.7876 12832250.78 546869.3006 1.53 testCompareUGTMaskNotByte ops/s 17871101.25 800199.6321 19902619.81 214003.3262 1.11 testCompareUGTMaskNotInt ops/s 4088304.421 137797.9723 7135454.33 124553.651 1.74 testCompareUGTMaskNotLong ops/s 2070610.42 19881.82182 2991536.365 36260.60767 1.44 testCompareUGTMaskNotShort ops/s 8637099.341 155822.1608 12756579.77 186068.199 1.47 testCompareULEMaskNotByte ops/s 17940901.36 1258029.364 18932484.94 694554.6305 1.05 testCompareULEMaskNotInt ops/s 4369177.511 74982.31936 6392773.082 550171.2266 1.46 testCompareULEMaskNotLong ops/s 2135905.761 43693.63178 2877579.631 41651.56289 1.34 testCompareULEMaskNotShort ops/s 8607710.544 132655.1676 12446370.04 441718.3035 1.44 testCompareULTMaskNotByte ops/s 17409912.23 1033204.537 20607479.99 362000.5056 1.18 testCompareULTMaskNotInt ops/s 4386455.9 119192.1635 6920123.264 186158.2845 1.57 testCompareULTMaskNotLong ops/s 2064995.149 38622.2734 2988343.589 39037.90006 1.44 testCompareULTMaskNotShort ops/s 8642182.752 230919.2442 13029582.09 437101.4923 1.5 ``` The small amount of performance degradation is due to test fluctuations.

bridgekeeper · 2025-04-16T06:40:12Z

👋 Welcome back erifan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-04-16T06:40:56Z

@erifan This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8354242: VectorAPI: combine vector not operation with compare

Reviewed-by: epeter, jbhateja, xgong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 101 new commits pushed to the master branch:

c2c44a0: 8367724: Remove Trailing Return Types from undecided list
e107179: 8367017: Remove legacy checks from WrappedToolkitTest and convert from bash
b75e35c: 8365858: FilteredJavaFieldStream is unnecessary
... and 98 more: https://git.openjdk.org/jdk/compare/53b3e0567d2801ddf62c5849b219324ddfcb264a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@eme64, @XiaohongGong, @jatin-bhateja) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk · 2025-04-16T06:42:05Z

@erifan The following labels will be automatically applied to this pull request:

core-libs
hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-04-16T06:45:17Z

Webrevs

src/hotspot/share/opto/vectornode.cpp

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java

src/hotspot/share/opto/vectornode.cpp

erifan

@jatin-bhateja Thanks for your review!

src/hotspot/share/opto/vectornode.cpp

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java

src/hotspot/share/opto/node.cpp

1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments.

eme64

Just a drive-by comment for now, I may review this later more fully.

I would also prefer if you added the IR restrictions rather than the JTREG requires.
The benefit is that we can still run the tests on all platforms, at least for result verification.

Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug.

Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either.

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java

erifan · 2025-04-28T07:48:58Z

Just a drive-by comment for now, I may review this later more fully.

I would also prefer if you added the IR restrictions rather than the JTREG requires.
The benefit is that we can still run the tests on all platforms, at least for result verification.
Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug.

Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either.

Thanks! The problem is that when a new platform is added, people may not even know there is a test.

eme64 · 2025-04-28T09:17:58Z

Just a drive-by comment for now, I may review this later more fully.

I would also prefer if you added the IR restrictions rather than the JTREG requires.
The benefit is that we can still run the tests on all platforms, at least for result verification.
Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug.

Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either.

Thanks! The problem is that when a new platform is added, people may not even know there is a test.

@erifan That is true. But we have that problem either way. If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash.

erifan · 2025-04-28T09:51:10Z

Just a drive-by comment for now, I may review this later more fully.

I would also prefer if you added the IR restrictions rather than the JTREG requires.
The benefit is that we can still run the tests on all platforms, at least for result verification.
Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug.

Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either.

Thanks! The problem is that when a new platform is added, people may not even know there is a test.

@erifan That is true. But we have that problem either way. If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash.

This test will run on new platforms when we use @requires. I explained the meaning of the @requires in the previous comment, it only excludes one case: when -XX:UseAVX=0 is specified on x86 platforms.

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java

eme64 · 2025-04-28T14:13:43Z

Just a drive-by comment for now, I may review this later more fully.

I would also prefer if you added the IR restrictions rather than the JTREG requires.
The benefit is that we can still run the tests on all platforms, at least for result verification.
Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug.

Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either.

Thanks! The problem is that when a new platform is added, people may not even know there is a test.

@erifan That is true. But we have that problem either way. If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash.

This test will run on new platforms when we use @requires. I explained the meaning of the @requires in the previous comment, it only excludes one case: when -XX:UseAVX=0 is specified on x86 platforms.

I see. You should probably add a comment there, to say that you are only excluding AVX=0.
But even UseAVX = 0 would profit from result verification.

erifan · 2025-04-29T02:43:17Z

Just a drive-by comment for now, I may review this later more fully.

I would also prefer if you added the IR restrictions rather than the JTREG requires.
The benefit is that we can still run the tests on all platforms, at least for result verification.
Imagine someone adds optimizations to a new platform, but does not know about this test here. They make a mistake, and there is a bug, leading either to a crash or wrong result. With the requires, you test would never even run, and we would not catch it. With the IR applyIf, we would catch the bug.

Just copy pasting the IR applyIf everywhere is not that much work, and adding in a new platform later is not really hard either.

Thanks! The problem is that when a new platform is added, people may not even know there is a test.

@erifan That is true. But we have that problem either way. If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash.

This test will run on new platforms when we use @requires. I explained the meaning of the @requires in the previous comment, it only excludes one case: when -XX:UseAVX=0 is specified on x86 platforms.

I see. You should probably add a comment there, to say that you are only excluding AVX=0. But even UseAVX = 0 would profit from result verification.

@requires is a special comment itself. I feel like it's a bit weird to add a comment to a comment, and I don't think the @requires is hard to understand.

If we want to verify the correctness of AVX=0, we have to use ApplyIf. This is back to the beginning of the question, should we use @requires or ApplyIf? Personally I tend to use the former. By the way, I have tested the correctness of AVX=0 locally.

eme64 · 2025-04-29T10:22:22Z

Yes, this discussion is down to requires vs applyIf. This is my argument for applyIf, quoted from above, I have not yet seen an argument against it:

If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash.

In my understanding, requires should only be used if the test really requires a certain platform or feature. That can be because some flags are only available under certain platforms for example. But for IR tests, we should try to always use applyIf, because it allows testing on other platforms.

Actually, I filed this RFE a while ago: https://bugs.openjdk.org/browse/JDK-8310891
We should try to move as many tests from using requires to applyIf, so that we have an increased test coverage.

erifan · 2025-04-30T01:24:17Z

Yes, this discussion is down to requires vs applyIf. This is my argument for applyIf, quoted from above, I have not yet seen an argument against it:

If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash.

In my understanding, requires should only be used if the test really requires a certain platform or feature. That can be because some flags are only available under certain platforms for example. But for IR tests, we should try to always use applyIf, because it allows testing on other platforms.

Actually, I filed this RFE a while ago: https://bugs.openjdk.org/browse/JDK-8310891 We should try to move as many tests from using requires to applyIf, so that we have an increased test coverage.

I see, I'll update the code. Thanks~

erifan · 2025-05-01T07:32:22Z

@eme64 @jatin-bhateja I have updated the test, thanks for your suggestion.

eme64

@erifan thanks for updating the tests!

Now I had a quick look at the VM code.

My biggest observation is this:

Wrapping VectorNode::Ideal somewhere in the middle of your new optimization is going to make future optimizations here much harder.
How would they check their conditions next to yours? That would be quite a mess.

I suggest you do this:

XorVNode::Ideal does
- checks in1 == in2 case
- calls a method called XorVNode::Ideal_XorV_VectorMaskCmp. Check if it succeeded, i.e. returns nullptr.
- ... future optimizations could go here ...
- Finally, i.e. none of the optimizations above worked: call VectorNode::Ideal

Then you pack all your new logic here into XorVNode::Ideal_XorV_VectorMaskCmp. You can also find a better name, it is just what I came up with just now.

This gives us a much more modular design, and it is easier to add another new optimization to XorVNode::Ideal. It is easy to change the precedence of the optimizations by just changing the order, etc.

Examples of this "modular" design:

CMoveNode::Ideal -> calls TypeNode::Ideal and Ideal_minmax.
StoreBNode::Ideal -> calls StoreNode::Ideal_masked_input and StoreNode::Ideal_sign_extended_input
These are really nice, because you can quickly see what optimizations we already have, and in which order they are checked.

src/hotspot/share/opto/vectornode.cpp

Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular.

erifan · 2025-08-21T06:12:10Z

/keepalive

openjdk · 2025-08-21T06:13:06Z

@erifan The pull request is being re-evaluated and the inactivity timeout has been reset.

erifan · 2025-08-21T06:15:34Z

Hi, can anyone review this PR?

erifan · 2025-09-03T10:09:58Z

Hi @eme64 @theRealAph @XiaohongGong @fg1417 @shqking , could you help take a look at this PR, thanks

eme64

Looks much better, thanks for the updates!

I have another small list of suggestions :)

src/hotspot/share/opto/vectornode.cpp

eme64 · 2025-09-09T13:03:03Z

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java

+    @Test
+    @IR(counts = { IRNode.XOR_V_MASK, "= 0",
+                   IRNode.XOR_V, "= 0",
+                   IRNode.VECTOR_MASK_CMP, "= 2" },
+        applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" })
+    public static void testCompareEQMaskNotFloat() {
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fb, (m) -> { return m.not(); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fb);
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fb, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fb);
+    }
+
+    @Test
+    @IR(counts = { IRNode.XOR_V_MASK, "= 0",
+                   IRNode.XOR_V, "= 0",
+                   IRNode.VECTOR_MASK_CMP, "= 2" },
+        applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" })
+    public static void testCompareNEMaskNotFloat() {
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fb, (m) -> { return m.not(); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fb);
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fb, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fb);
+    }
+
+    @Test
+    @IR(counts = { IRNode.XOR_V_MASK, "= 0",
+                   IRNode.XOR_V, "= 0",
+                   IRNode.VECTOR_MASK_CMP, "= 2" },
+        applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" })
+    public static void testCompareEQMaskNotFloatNaN() {
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fnan, (m) -> { return m.not(); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fnan);
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fnan, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fnan);
+    }
+
+    @Test
+    @IR(counts = { IRNode.XOR_V_MASK, "= 0",
+                   IRNode.XOR_V, "= 0",
+                   IRNode.VECTOR_MASK_CMP, "= 2" },
+        applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" })
+    public static void testCompareNEMaskNotFloatNaN() {
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fnan, (m) -> { return m.not(); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fnan);
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fnan, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fnan);
+    }
+
+    @Test
+    @IR(counts = { IRNode.XOR_V_MASK, "= 0",
+                   IRNode.XOR_V, "= 0",
+                   IRNode.VECTOR_MASK_CMP, "= 2" },
+        applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" })
+    public static void testCompareEQMaskNotFloatPositiveInfinity() {
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fpinf, (m) -> { return m.not(); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fpinf);
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fpinf, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fpinf);
+    }
+
+    @Test
+    @IR(counts = { IRNode.XOR_V_MASK, "= 0",
+                   IRNode.XOR_V, "= 0",
+                   IRNode.VECTOR_MASK_CMP, "= 2" },
+        applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" })
+    public static void testCompareNEMaskNotFloatPositiveInfinity() {
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fpinf, (m) -> { return m.not(); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fpinf);
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fpinf, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fpinf);
+    }
+
+    @Test
+    @IR(counts = { IRNode.XOR_V_MASK, "= 0",
+                   IRNode.XOR_V, "= 0",
+                   IRNode.VECTOR_MASK_CMP, "= 2" },
+        applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" })
+    public static void testCompareEQMaskNotFloatNegativeInfinity() {
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fninf, (m) -> { return m.not(); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fninf);
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.EQ, fa, fninf, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.EQ, fa, fninf);
+    }
+
+    @Test
+    @IR(counts = { IRNode.XOR_V_MASK, "= 0",
+                   IRNode.XOR_V, "= 0",
+                   IRNode.VECTOR_MASK_CMP, "= 2" },
+        applyIfCPUFeatureOr = { "asimd", "true", "avx", "true", "rvv", "true" })
+    public static void testCompareNEMaskNotFloatNegativeInfinity() {
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fninf, (m) -> { return m.not(); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fninf);
+        testCompareMaskNotFloat(F_SPECIES, VectorOperators.NE, fa, fninf, (m) -> { return F_SPECIES.maskAll(true).xor(m); });
+        verifyResultsFloat(F_SPECIES, VectorOperators.NE, fa, fninf);
+    }


Do you have test cases for the cases other than EQ and NE? After all, we don't that someone accidentally messes with the logic you implemented later and we don't notice the bug ;)

For float and double, only EQ and NE are supported. So the positive test only includes these two OPs. And we have one negative test for other unsupported OPs, see testCompareMaskNotFloatNegative.

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java

test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java

erifan

@eme64 Thank you for your patience in reviewing this PR. I'm doing some internal testing and expect to push a new commit next week. I'll be on vacation for the next two days. Thank you!

src/hotspot/share/opto/vectornode.cpp

test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java

eme64 · 2025-09-10T07:43:20Z

@erifan Sounds good. No rush, it takes as long as it takes. I'll soon be on vacation too and may not respond until mid of October.

erifan · 2025-09-15T05:43:36Z

Hi @eme64 I have dealt with all of your suggestions except one that I think it has already been covered. Could you please have a look at this PR when you have a chance? Thanks!

jatin-bhateja

Your benchmark and code changes look good to me. Thanks for addressing my comments.

erifan · 2025-09-15T09:45:40Z

Thanks @jatin-bhateja . And the updated benchmarks test results are as follow, no much changes.

On Nvidia Grace machine with 128-bit SVE2:
With option -XX:UseSVE=2:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	908008.7644	827.699314	1175289.515	240.548861	1.294359
testCompareMaskNotDouble	NE	ops/s	872199.2489	131.090115	1175667.777	129.741515	1.347934
testCompareMaskNotDouble	LT	ops/s	880166.7559	1570.41653	882160.6889	4723.507639	1.002265
testCompareMaskNotDouble	LE	ops/s	878115.3293	2919.637497	879033.7895	5404.617017	1.001045
testCompareMaskNotDouble	GT	ops/s	877068.5325	9595.275981	865832.864	5054.26002	0.987189
testCompareMaskNotDouble	GE	ops/s	895695.0228	3276.687933	871153.7117	7714.572967	0.9726
testCompareMaskNotFloat	    EQ	ops/s	1811841.295	278.140948	2350971.83	606.667654	1.297559
testCompareMaskNotFloat	    NE	ops/s	1727124.634	1755.717051	2351789.019	269.531198	1.361678
testCompareMaskNotFloat	    LT	ops/s	1735243.319	4912.343726	1726257.01	823.746765	0.994821
testCompareMaskNotFloat	    LE	ops/s	1726151.367	1071.383328	1727029.339	960.336314	1.000508
testCompareMaskNotFloat	    GT	ops/s	1729704.897	1646.026351	1726069.02	440.981281	0.997897
testCompareMaskNotFloat	    GE	ops/s	1726515.227	2171.61643	1728365.682	1404.298156	1.001071
testCompareMaskNotByte	    EQ	ops/s	8480574.694	1254.415788	10200329.86	8560.199493	1.202787
testCompareMaskNotByte	    NE	ops/s	8480141.263	1437.762594	10207424.91	3664.106923	1.203685
testCompareMaskNotByte	    LT	ops/s	8471471.384	7699.585554	10203300.19	4675.047416	1.20443
testCompareMaskNotByte	    LE	ops/s	8476165.519	6045.944392	10204956.23	2174.866199	1.203959
testCompareMaskNotByte	    GT	ops/s	8479397.377	1290.560961	10207032.3	5414.789178	1.203745
testCompareMaskNotByte	    GE	ops/s	8479979.908	1094.823175	10203115.77	2909.433184	1.2032
testCompareMaskNotByte	    ULT	ops/s	8480915.515	1420.30856	10213140.54	19628.56888	1.204249
testCompareMaskNotByte	    ULE	ops/s	8481768.961	1806.086454	10191601.05	9537.089409	1.201589
testCompareMaskNotByte	    UGT	ops/s	8477948.807	3652.437106	10208439.79	8335.226416	1.204116
testCompareMaskNotByte	    UGE	ops/s	8477320.065	2191.753237	10198589.9	5748.761942	1.203044
testCompareMaskNotInt	    EQ	ops/s	1906386.393	208.045573	2346741.129	383.461819	1.230989
testCompareMaskNotInt	    NE	ops/s	1674206.146	169.967081	2346609.602	652.964692	1.401625
testCompareMaskNotInt	    LT	ops/s	1684755.085	4939.806653	2345939.728	738.842445	1.392451
testCompareMaskNotInt	    LE	ops/s	1659985.83	2408.542766	2346929.8	192.550397	1.413825
testCompareMaskNotInt	    GT	ops/s	1674460.437	447.120589	2347037.155	342.433085	1.401667
testCompareMaskNotInt	    GE	ops/s	1658699.073	884.268891	2347411.827	281.885914	1.415212
testCompareMaskNotInt	    ULT	ops/s	1677043.66	6215.834359	2347155.384	425.141786	1.399579
testCompareMaskNotInt	    ULE	ops/s	1667049.76	9521.094204	2346815.213	316.03901	1.407765
testCompareMaskNotInt	    UGT	ops/s	1661045.828	3669.548525	2346711.365	2808.608132	1.412791
testCompareMaskNotInt	    UGE	ops/s	1663715.691	4570.73053	2347096.847	191.804359	1.410755
testCompareMaskNotLong	    EQ	ops/s	885668.5947	203.053456	1174274.006	113.51354	1.325861
testCompareMaskNotLong	    NE	ops/s	837449.9353	198.611966	1174330.269	106.514374	1.402269
testCompareMaskNotLong	    LT	ops/s	846790.2128	7005.585657	1174290.879	93.56413	1.386755
testCompareMaskNotLong	    LE	ops/s	851253.2346	7624.045467	1174162.355	179.854316	1.379333
testCompareMaskNotLong	    GT	ops/s	837715.7563	4272.558281	1173797.819	289.311518	1.401188
testCompareMaskNotLong	    GE	ops/s	883137.593	14804.63746	1174216.909	86.404559	1.329596
testCompareMaskNotLong	    ULT	ops/s	872478.9017	4955.722542	1174341.995	124.656933	1.345983
testCompareMaskNotLong	    ULE	ops/s	866570.738	12541.58528	1174185.197	594.850706	1.354979
testCompareMaskNotLong	    UGT	ops/s	866389.0927	3971.492766	1174210.803	153.960084	1.355292
testCompareMaskNotLong	    UGE	ops/s	848339.3876	4555.514721	1174060.638	240.326562	1.383951
testCompareMaskNotShort	    EQ	ops/s	3336170.783	2286.717236	4684904.156	2134.72575	1.404275
testCompareMaskNotShort	    NE	ops/s	3334775.472	717.588615	4690264.12	3017.756867	1.40647
testCompareMaskNotShort	    LT	ops/s	3334619.058	1138.901707	4685883.864	3808.321694	1.405223
testCompareMaskNotShort	    LE	ops/s	3335538.353	538.676789	4688238.934	1029.406266	1.405541
testCompareMaskNotShort	    GT	ops/s	3301425.217	694.060525	4689167.049	2845.363801	1.420346
testCompareMaskNotShort	    GE	ops/s	3301580.972	317.042851	4688970.211	1292.83929	1.420219
testCompareMaskNotShort	    ULT	ops/s	3336318.051	892.515034	4687549.384	1403.281648	1.405006
testCompareMaskNotShort	    ULE	ops/s	3335188.292	972.230191	4684723.63	3937.599084	1.404635
testCompareMaskNotShort	    UGT	ops/s	3334490.656	930.409628	4688058.378	1166.776081	1.405929
testCompareMaskNotShort	    UGE	ops/s	3333050.033	3146.019596	4689197.9	456.439188	1.406878

With option -XX:UseSVE=0:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	788505.9464	579.254839	769969.5798	138.792325	0.976491
testCompareMaskNotDouble	NE	ops/s	655499.7935	471.970429	915086.3257	183.495964	1.396013
testCompareMaskNotDouble	LT	ops/s	788418.7889	574.263314	789271.7448	51.838991	1.001081
testCompareMaskNotDouble	LE	ops/s	789144.8431	45.334181	789326.1963	84.148011	1.000229
testCompareMaskNotDouble	GT	ops/s	788690.8485	662.950083	789246.9812	99.060588	1.000705
testCompareMaskNotDouble	GE	ops/s	789421.2387	94.012868	789166.4717	111.772533	0.999677
testCompareMaskNotFloat	    EQ	ops/s	1816132.864	1298.2187	1816461.601	311.706275	1.000181
testCompareMaskNotFloat	    NE	ops/s	1550767.697	1142.987761	2301429.148	159.71525	1.484057
testCompareMaskNotFloat	    LT	ops/s	1815531.685	1370.868745	1817187.121	761.68401	1.000911
testCompareMaskNotFloat	    LE	ops/s	1817937.722	484.638134	1817703.209	625.275639	0.999871
testCompareMaskNotFloat	    GT	ops/s	1818618.89	724.324392	1817977.851	481.152488	0.999647
testCompareMaskNotFloat	    GE	ops/s	1815118.411	1327.945736	1817476.414	510.712942	1.001299
testCompareMaskNotByte	    EQ	ops/s	6489599.571	5127.815254	6535895.286	17029.15534	1.007133
testCompareMaskNotByte	    NE	ops/s	9089974.523	4069.346579	15945662.17	22867.48282	1.754203
testCompareMaskNotByte	    LT	ops/s	6499040.898	1250.085336	15939338.57	17451.05939	2.452567
testCompareMaskNotByte	    LE	ops/s	6493612.339	4928.466061	15926355.01	27249.57103	2.452618
testCompareMaskNotByte	    GT	ops/s	6494486.565	5229.4598	15957497.14	6893.237334	2.457083
testCompareMaskNotByte	    GE	ops/s	6499295.661	1030.044749	15903755.01	46454.70992	2.446996
testCompareMaskNotByte	    ULT	ops/s	6494212.684	5194.712704	15944816.71	3467.818892	2.455234
testCompareMaskNotByte	    ULE	ops/s	6493882.576	5092.839387	15936419.25	22755.34523	2.454066
testCompareMaskNotByte	    UGT	ops/s	6493479.899	4678.096391	15958133.18	3483.353667	2.457562
testCompareMaskNotByte	    UGE	ops/s	6500338.419	709.344957	15968155.27	14020.47085	2.456511
testCompareMaskNotInt	    EQ	ops/s	1830787.273	237.597163	1878452.588	142.728192	1.026035
testCompareMaskNotInt	    NE	ops/s	1615081.395	1219.871461	2360913.712	199.556675	1.461792
testCompareMaskNotInt	    LT	ops/s	1827819.867	1360.728526	2360561.422	248.025925	1.291462
testCompareMaskNotInt	    LE	ops/s	1830975.648	416.987529	2360703.924	194.958346	1.289314
testCompareMaskNotInt	    GT	ops/s	1830633.964	301.849017	2360552.203	224.908655	1.289472
testCompareMaskNotInt	    GE	ops/s	1829476.495	1348.361278	2360673.736	137.538696	1.290354
testCompareMaskNotInt	    ULT	ops/s	1829137.773	1285.55232	2360615.95	162.876291	1.290562
testCompareMaskNotInt	    ULE	ops/s	1828107.468	1360.867847	2360790.337	297.267481	1.291384
testCompareMaskNotInt	    UGT	ops/s	1829659.222	1459.098806	2361025.107	266.158075	1.290417
testCompareMaskNotInt	    UGE	ops/s	1829548.187	1427.266787	2360941.943	242.380469	1.29045
testCompareMaskNotLong	    EQ	ops/s	810439.9121	82.577412	802287.4993	73.462086	0.98994
testCompareMaskNotLong	    NE	ops/s	681643.6089	485.657471	932324.6973	158.28799	1.367759
testCompareMaskNotLong	    LT	ops/s	809850.546	680.71673	931404.3219	685.591444	1.150094
testCompareMaskNotLong	    LE	ops/s	810584.5191	115.234753	932234.2412	105.451172	1.150076
testCompareMaskNotLong	    GT	ops/s	810593.5376	117.947863	931879.1829	553.397713	1.149625
testCompareMaskNotLong	    GE	ops/s	810435.8405	81.88737	931833.0348	177.765694	1.149792
testCompareMaskNotLong	    ULT	ops/s	810429.8459	90.005329	932127.5278	74.443387	1.150164
testCompareMaskNotLong	    ULE	ops/s	809740.842	411.655134	932231.6607	76.044104	1.151271
testCompareMaskNotLong	    UGT	ops/s	810493.4369	52.024062	932239.1709	143.915229	1.150211
testCompareMaskNotLong	    UGE	ops/s	810442.0661	64.064396	932361.567	119.570287	1.150435
testCompareMaskNotShort	    EQ	ops/s	4786426.182	299.050738	4694123.013	482.608634	0.980715
testCompareMaskNotShort	    NE	ops/s	3808932.807	2993.590606	5672255.469	6262.526335	1.489198
testCompareMaskNotShort	    LT	ops/s	4782535.485	3699.104322	5668474.071	11101.86452	1.185244
testCompareMaskNotShort	    LE	ops/s	4782896.891	3338.57484	5669188.434	6309.723399	1.185304
testCompareMaskNotShort	    GT	ops/s	4778532.318	3571.547653	5680482.703	10427.66734	1.18875
testCompareMaskNotShort	    GE	ops/s	4786150.851	794.769881	5664644.919	6542.434538	1.183549
testCompareMaskNotShort	    ULT	ops/s	4783623.78	3582.962421	5668267.123	17841.44773	1.184931
testCompareMaskNotShort	    ULE	ops/s	4782752.125	3610.296618	5666231.302	6964.505363	1.184721
testCompareMaskNotShort	    UGT	ops/s	4782469.332	2913.37576	5655837.96	6494.608864	1.182618
testCompareMaskNotShort	    UGE	ops/s	4782606.35	3491.774067	5667295.182	14176.96543	1.18498

On AMD EPYC 9124 16-Core Processor:
With option -XX:UseAVX=3:

Benchmark		COMPARISON_OP	Unit	Before		Score Error	After		Score Error	Uplift
testCompareMaskNotDouble	EQ	ops/s	2166357.886	27577.51358	2920183.192	38491.49083	1.347968
testCompareMaskNotDouble	NE	ops/s	2177325.341	32771.27023	2965747.932	39271.62615	1.362106
testCompareMaskNotDouble	LT	ops/s	2123834.711	22890.39919	2197099.169	29107.41329	1.034496
testCompareMaskNotDouble	LE	ops/s	2172931.681	32912.05647	2121686.057	34927.37781	0.976416
testCompareMaskNotDouble	GT	ops/s	2164924.662	30925.91899	2124062.892	37135.0458	0.981125
testCompareMaskNotDouble	GE	ops/s	2150619.038	35515.09022	2192636.533	38672.85716	1.019537
testCompareMaskNotFloat	    EQ	ops/s	4518378.764	74733.72389	6724589.409	50424.63568	1.488274
testCompareMaskNotFloat	    NE	ops/s	4522823.224	78138.66727	6907565.257	203953.3299	1.527268
testCompareMaskNotFloat	    LT	ops/s	4587473.545	62621.25938	4431658.918	52760.23989	0.966034
testCompareMaskNotFloat	    LE	ops/s	4472078.986	79338.23304	4472390.043	66247.285	1.000069
testCompareMaskNotFloat	    GT	ops/s	4451744.39	220787.9755	4440866.486	58674.19154	0.997556
testCompareMaskNotFloat	    GE	ops/s	4459601.349	57873.05167	4481398.426	76819.69285	1.004887
testCompareMaskNotByte	    EQ	ops/s	19415317.92	356367.4937	20649319.86	240515.9459	1.063558
testCompareMaskNotByte	    NE	ops/s	19401162.58	362571.8103	21010358.2	71221.35255	1.082943
testCompareMaskNotByte	    LT	ops/s	19175612.37	273080.6175	20235838.72	396190.6101	1.05529
testCompareMaskNotByte	    LE	ops/s	19036831.33	121135.0491	20674528.84	248839.9471	1.086027
testCompareMaskNotByte	    GT	ops/s	19008302.3	124633.9182	20671390.89	271644.5576	1.087492
testCompareMaskNotByte	    GE	ops/s	19590753.42	429156.452	20491615.07	332912.82	1.045984
testCompareMaskNotByte	    ULT	ops/s	19431604.06	421396.5487	20575805.9	248466.2368	1.058883
testCompareMaskNotByte	    ULE	ops/s	19060425.47	98309.75469	20774930.43	206596.0422	1.089951
testCompareMaskNotByte	    UGT	ops/s	19266788.04	362893.3051	20861521.87	106977.3707	1.082771
testCompareMaskNotByte	    UGE	ops/s	19127964.33	447774.3747	20791221.56	254458.0132	1.086954
testCompareMaskNotInt	    EQ	ops/s	4473402.48	84902.77154	7191777.028	94315.13878	1.607674
testCompareMaskNotInt	    NE	ops/s	4583165.363	73491.79073	7249884.988	80028.31191	1.581851
testCompareMaskNotInt	    LT	ops/s	4618634.192	81869.82512	7242567.732	71211.3697	1.568118
testCompareMaskNotInt	    LE	ops/s	4650524.195	72302.56692	7154948.491	83057.90635	1.538525
testCompareMaskNotInt	    GT	ops/s	4534752.486	94449.20198	7004428.251	38365.18576	1.54461
testCompareMaskNotInt	    GE	ops/s	4540777.389	86331.11847	7129527.341	74343.06996	1.570111
testCompareMaskNotInt	    ULT	ops/s	4528175.644	114213.6504	7220013.98	82850.22587	1.594464
testCompareMaskNotInt	    ULE	ops/s	4619335.448	74203.98889	7118543.128	54457.43284	1.541031
testCompareMaskNotInt	    UGT	ops/s	4572521.254	122912.75	7154797.741	98858.3477	1.564737
testCompareMaskNotInt	    UGE	ops/s	4579627.842	80558.04554	7179020.593	99239.23499	1.567599
testCompareMaskNotLong	    EQ	ops/s	2103965.347	17059.28178	2997338.009	32388.42725	1.424613
testCompareMaskNotLong	    NE	ops/s	2174434.633	36011.24708	2984460.593	29074.42994	1.372522
testCompareMaskNotLong	    LT	ops/s	2110937.378	56642.0052	3020690.893	31167.62537	1.430971
testCompareMaskNotLong	    LE	ops/s	2153414.166	31280.20562	2971696.162	31176.24605	1.379992
testCompareMaskNotLong	    GT	ops/s	2166028.207	49432.18925	3008018.282	26534.78551	1.388725
testCompareMaskNotLong	    GE	ops/s	2178206.136	35757.6799	2933186.687	19824.26727	1.346606
testCompareMaskNotLong	    ULT	ops/s	2104344.728	31405.7728	2964354.007	26871.18289	1.408682
testCompareMaskNotLong	    ULE	ops/s	2210232.578	21993.95777	3032635.261	25545.43656	1.372088
testCompareMaskNotLong	    UGT	ops/s	2167177.931	44896.90807	2996245.236	34153.68941	1.382556
testCompareMaskNotLong	    UGE	ops/s	2117175.328	26131.1893	2977492.164	23227.65519	1.406351
testCompareMaskNotShort	    EQ	ops/s	8131234.179	185997.1777	12414378.38	122648.1579	1.526752
testCompareMaskNotShort	    NE	ops/s	8506016.656	236481.383	12720442.64	322747.8776	1.495464
testCompareMaskNotShort	    LT	ops/s	8487868.819	244943.6097	12150479.62	244300.5456	1.431511
testCompareMaskNotShort	    LE	ops/s	8549184.557	286833.466	12358019.06	136683.2112	1.44552
testCompareMaskNotShort	    GT	ops/s	8375447.45	221237.073	12602058.97	385690.3318	1.504643
testCompareMaskNotShort	    GE	ops/s	8123474.548	127727.1461	12799747.64	197940.1001	1.575649
testCompareMaskNotShort	    ULT	ops/s	8491650.422	313124.2425	12751186.59	255845.1653	1.501614
testCompareMaskNotShort	    ULE	ops/s	8363009.676	203670.1995	12675908.7	279496.9925	1.515711
testCompareMaskNotShort	    UGT	ops/s	8332268.933	279787.2503	12279451.4	436971.6582	1.473722
testCompareMaskNotShort	    UGE	ops/s	8931588.505	203962.9257	12324437.67	330723.3066	1.37987

eme64

@erifan Nice work on the benchmark refactor! And thanks for the other updates.

I'll run some testing now, should take about 24h.

eme64

@erifan Thanks for the work! All tests pass on my side, patch looks good to me too :)

erifan · 2025-09-17T07:22:25Z

Thanks all for your help, I'll integrate the PR.

erifan · 2025-09-17T07:22:35Z

/integrate

openjdk · 2025-09-17T07:24:17Z

@erifan
Your change (at version 56bb34f) is now ready to be sponsored by a Committer.

XiaohongGong · 2025-09-17T07:30:44Z

/sponsor

openjdk · 2025-09-17T07:32:20Z

Going to push as commit 45cc515.
Since your change was applied there have been 101 commits pushed to the master branch:

c2c44a0: 8367724: Remove Trailing Return Types from undecided list
e107179: 8367017: Remove legacy checks from WrappedToolkitTest and convert from bash
b75e35c: 8365858: FilteredJavaFieldStream is unnecessary
... and 98 more: https://git.openjdk.org/jdk/compare/53b3e0567d2801ddf62c5849b219324ddfcb264a...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-09-17T07:32:33Z

@XiaohongGong @erifan Pushed as commit 45cc515.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk bot added the rfr Pull request is ready for review label Apr 16, 2025

openjdk bot added hotspot-compiler [email protected] core-libs [email protected] labels Apr 16, 2025

Merge branch 'master' into JDK-8354242

1b9c3b3

jatin-bhateja reviewed Apr 23, 2025

View reviewed changes

src/hotspot/share/opto/vectornode.cpp Outdated Show resolved Hide resolved

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java Outdated Show resolved Hide resolved

jatin-bhateja reviewed Apr 23, 2025

View reviewed changes

src/hotspot/share/opto/vectornode.cpp Outdated Show resolved Hide resolved

erifan commented Apr 24, 2025

View reviewed changes

theRealAph reviewed Apr 24, 2025

View reviewed changes

src/hotspot/share/opto/node.cpp Show resolved Hide resolved

erifan added 2 commits April 25, 2025 07:12

Merge branch 'master' into JDK-8354242

7049b92

Addressed some review comments

34eae98

1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments.

eme64 suggested changes Apr 28, 2025

View reviewed changes

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java Outdated Show resolved Hide resolved

eme64 reviewed Apr 28, 2025

View reviewed changes

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java Show resolved Hide resolved

erifan added 2 commits May 1, 2025 07:26

Merge branch 'master' into JDK-8354242

bf0031a

Update the jtreg test

4fbf84e

eme64 reviewed May 2, 2025

View reviewed changes

src/hotspot/share/opto/vectornode.cpp Outdated Show resolved Hide resolved

src/hotspot/share/opto/vectornode.cpp Outdated Show resolved Hide resolved

src/hotspot/share/opto/vectornode.cpp Outdated Show resolved Hide resolved

erifan added 2 commits May 7, 2025 01:55

Merge branch 'master' into JDK-8354242

4e23ce4

Refactor code

001fac0

Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular.

eme64 suggested changes Sep 9, 2025

View reviewed changes

erifan commented Sep 10, 2025

View reviewed changes

erifan added 3 commits September 10, 2025 08:37

Merge branch 'master' into JDK-8354242

6a9a5eb

Simplify JMH testing

52bbd3c

Add an IR rule for vector mask cast operation

56bb34f

erifan requested a review from eme64 September 15, 2025 05:39

jatin-bhateja approved these changes Sep 15, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Sep 15, 2025

eme64 reviewed Sep 16, 2025

View reviewed changes

eme64 approved these changes Sep 17, 2025

View reviewed changes

openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 17, 2025

openjdk bot added the integrated Pull request has been integrated label Sep 17, 2025

openjdk bot closed this Sep 17, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 17, 2025

erifan deleted the JDK-8354242 branch September 22, 2025 09:12

8354242: VectorAPI: combine vector not operation with compare #24674

8354242: VectorAPI: combine vector not operation with compare #24674

Uh oh!

Conversation

erifan commented Apr 16, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Apr 16, 2025

Uh oh!

openjdk bot commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Apr 16, 2025

Uh oh!

mlbridge bot commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erifan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erifan commented Apr 28, 2025

Uh oh!

eme64 commented Apr 28, 2025

Uh oh!

erifan commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eme64 commented Apr 28, 2025

Uh oh!

erifan commented Apr 29, 2025

Uh oh!

eme64 commented Apr 29, 2025

Uh oh!

erifan commented Apr 30, 2025

Uh oh!

erifan commented May 1, 2025

Uh oh!

eme64 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erifan commented Aug 21, 2025

Uh oh!

openjdk bot commented Aug 21, 2025

Uh oh!

erifan commented Aug 21, 2025

Uh oh!

erifan commented Sep 3, 2025

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eme64 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

erifan Sep 15, 2025

Choose a reason for hiding this comment

erifan commented Apr 16, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Apr 16, 2025 •

edited

Loading

mlbridge bot commented Apr 16, 2025 •

edited

Loading

erifan commented Apr 28, 2025 •

edited

Loading

eme64 left a comment •

edited

Loading

jatin-bhateja left a comment •

edited

Loading