Skip to content

Conversation

@wenshao
Copy link
Contributor

@wenshao wenshao commented Jun 16, 2024

8318446 brings MergeStore. We need a JMH Benchmark to evaluate the performance of various batch operations and the effect of MergeStore.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8334342: Add MergeStore JMH benchmarks (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19734/head:pull/19734
$ git checkout pull/19734

Update a local copy of the PR:
$ git checkout pull/19734
$ git pull https://git.openjdk.org/jdk.git pull/19734/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19734

View PR using the GUI difftool:
$ git pr show -t 19734

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19734.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 16, 2024

👋 Welcome back wenshao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 16, 2024

@wenshao This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8334342: Add MergeStore JMH benchmarks

Reviewed-by: epeter, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 13 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@eme64, @TobiHartmann) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Jun 16, 2024

@wenshao The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@wenshao
Copy link
Contributor Author

wenshao commented Jun 16, 2024

1. Cases MergeStore does not work

From the results of running the test, the following method MergeStore does not work

getIntB    
getIntBU   
getIntL    
getIntLU   
getIntRB   
getIntRBU  
getIntRL   
getIntRLU  
getLongB   
getLongBU  
getLongL   
getLongLU  
getLongRB  
getLongRBU 
getLongRL  
getLongRLU 
putChars4UC
setIntB    
setIntBU 
setIntRB    
setIntRBU   
setLongB
setLongBU  
setLongRB  
setLongRBU 

@eme64 Please help me find out what the reason is and whether it can be improved.

2. Performance numbers

The names of these cases have the following B/L/V/U suffixes, which are:

B BigEndian
L LittleEndian
V VarHandle
U Unsafe
R reverseBytes

2.1 MacBook M1 Pro (aarch64)

Benchmark                    Mode  Cnt      Score    Error  Units
MergeStoreBench.getIntB      avgt   15   6286.579 ? 20.457  ns/op
MergeStoreBench.getIntBU     avgt   15   5225.216 ?  8.330  ns/op
MergeStoreBench.getIntBV     avgt   15   1210.682 ?  9.729  ns/op
MergeStoreBench.getIntL      avgt   15   6164.693 ? 10.310  ns/op
MergeStoreBench.getIntLU     avgt   15   5143.012 ? 14.522  ns/op
MergeStoreBench.getIntLV     avgt   15   2559.030 ?  3.875  ns/op
MergeStoreBench.getIntRB     avgt   15   6878.932 ? 33.494  ns/op
MergeStoreBench.getIntRBU    avgt   15   5767.165 ?  5.969  ns/op
MergeStoreBench.getIntRL     avgt   15   6627.529 ? 16.028  ns/op
MergeStoreBench.getIntRLU    avgt   15   5751.723 ? 23.192  ns/op
MergeStoreBench.getIntRU     avgt   15   2545.811 ?  3.665  ns/op
MergeStoreBench.getIntU      avgt   15   2540.611 ?  1.315  ns/op
MergeStoreBench.getLongB     avgt   15  12089.536 ? 14.140  ns/op
MergeStoreBench.getLongBU    avgt   15   9781.314 ? 71.234  ns/op
MergeStoreBench.getLongBV    avgt   15   2592.388 ?  4.432  ns/op
MergeStoreBench.getLongL     avgt   15  12024.902 ? 12.263  ns/op
MergeStoreBench.getLongLU    avgt   15   9678.164 ? 66.240  ns/op
MergeStoreBench.getLongLV    avgt   15   2558.131 ?  4.451  ns/op
MergeStoreBench.getLongRB    avgt   15  12085.246 ? 13.510  ns/op
MergeStoreBench.getLongRBU   avgt   15   9764.272 ? 12.714  ns/op
MergeStoreBench.getLongRL    avgt   15  12030.738 ? 22.437  ns/op
MergeStoreBench.getLongRLU   avgt   15   9653.951 ? 29.618  ns/op
MergeStoreBench.getLongRU    avgt   15   2546.557 ?  2.935  ns/op
MergeStoreBench.getLongU     avgt   15   2540.195 ?  2.042  ns/op
MergeStoreBench.putChars4    avgt   15   8489.149 ? 12.100  ns/op
MergeStoreBench.putChars4UB  avgt   15   3829.348 ?  7.844  ns/op
MergeStoreBench.putChars4UC  avgt   15   4483.231 ?  2.922  ns/op
MergeStoreBench.setIntB      avgt   15   5098.299 ?  5.857  ns/op
MergeStoreBench.setIntBU     avgt   15   5100.068 ?  7.315  ns/op
MergeStoreBench.setIntBV     avgt   15   1225.125 ?  1.650  ns/op
MergeStoreBench.setIntL      avgt   15   2765.106 ?  4.291  ns/op
MergeStoreBench.setIntLU     avgt   15   2574.478 ?  6.680  ns/op
MergeStoreBench.setIntLV     avgt   15   5106.786 ?  1.659  ns/op
MergeStoreBench.setIntRB     avgt   15   5372.028 ? 38.223  ns/op
MergeStoreBench.setIntRBU    avgt   15   5413.775 ? 10.059  ns/op
MergeStoreBench.setIntRL     avgt   15   5289.971 ?  4.359  ns/op
MergeStoreBench.setIntRLU    avgt   15   5125.193 ?  1.667  ns/op
MergeStoreBench.setIntRU     avgt   15   5102.132 ? 10.858  ns/op
MergeStoreBench.setIntU      avgt   15   5104.280 ? 53.560  ns/op
MergeStoreBench.setLongB     avgt   15  10249.911 ? 12.840  ns/op
MergeStoreBench.setLongBU    avgt   15  10231.282 ?  6.696  ns/op
MergeStoreBench.setLongBV    avgt   15   2665.162 ?  5.059  ns/op
MergeStoreBench.setLongL     avgt   15   6306.266 ?  7.843  ns/op
MergeStoreBench.setLongLU    avgt   15   2878.446 ? 62.543  ns/op
MergeStoreBench.setLongLV    avgt   15   2663.849 ?  1.446  ns/op
MergeStoreBench.setLongRB    avgt   15  10250.651 ? 16.368  ns/op
MergeStoreBench.setLongRBU   avgt   15  10237.918 ? 14.213  ns/op
MergeStoreBench.setLongRL    avgt   15   6645.274 ?  9.166  ns/op
MergeStoreBench.setLongRLU   avgt   15   3227.096 ?  2.098  ns/op
MergeStoreBench.setLongRU    avgt   15   2609.076 ?  3.404  ns/op
MergeStoreBench.setLongU     avgt   15   2593.581 ?  1.021  ns/op

2.2 MacBook 2018 i9 (x64)

  • CPU Intel i9
Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getIntB      avgt   15  11342.301 ? 176.256  ns/op
MergeStoreBench.getIntBU     avgt   15   7151.310 ?  75.508  ns/op
MergeStoreBench.getIntBV     avgt   15    280.465 ?   2.483  ns/op
MergeStoreBench.getIntL      avgt   15  11124.116 ? 132.253  ns/op
MergeStoreBench.getIntLU     avgt   15   7126.255 ?  33.276  ns/op
MergeStoreBench.getIntLV     avgt   15   1840.656 ?  25.828  ns/op
MergeStoreBench.getIntRB     avgt   15  12084.764 ? 126.922  ns/op
MergeStoreBench.getIntRBU    avgt   15   7822.741 ? 136.473  ns/op
MergeStoreBench.getIntRL     avgt   15  11370.996 ? 150.874  ns/op
MergeStoreBench.getIntRLU    avgt   15   7638.077 ?  86.311  ns/op
MergeStoreBench.getIntRU     avgt   15   2278.221 ?  19.787  ns/op
MergeStoreBench.getIntU      avgt   15   2063.943 ?  10.099  ns/op
MergeStoreBench.getLongB     avgt   15  22398.302 ? 479.694  ns/op
MergeStoreBench.getLongBU    avgt   15  13656.548 ? 212.759  ns/op
MergeStoreBench.getLongBV    avgt   15    757.250 ?  13.629  ns/op
MergeStoreBench.getLongL     avgt   15  20721.523 ? 186.996  ns/op
MergeStoreBench.getLongLU    avgt   15  13647.936 ? 147.855  ns/op
MergeStoreBench.getLongLV    avgt   15   1855.380 ?  30.576  ns/op
MergeStoreBench.getLongRB    avgt   15  22258.859 ? 363.429  ns/op
MergeStoreBench.getLongRBU   avgt   15  13688.325 ? 111.394  ns/op
MergeStoreBench.getLongRL    avgt   15  20736.818 ? 134.670  ns/op
MergeStoreBench.getLongRLU   avgt   15  13648.559 ? 218.167  ns/op
MergeStoreBench.getLongRU    avgt   15   2962.730 ?  61.445  ns/op
MergeStoreBench.getLongU     avgt   15   2881.851 ?  31.687  ns/op
MergeStoreBench.putChars4    avgt   15   5842.259 ? 166.988  ns/op
MergeStoreBench.putChars4UB  avgt   15   3621.801 ?  36.636  ns/op
MergeStoreBench.putChars4UC  avgt   15   7728.219 ? 599.829  ns/op
MergeStoreBench.setIntB      avgt   15   9754.119 ? 100.943  ns/op
MergeStoreBench.setIntBU     avgt   15  12094.327 ?  88.931  ns/op
MergeStoreBench.setIntBV     avgt   15    546.581 ?  11.151  ns/op
MergeStoreBench.setIntL      avgt   15   2241.645 ?  21.620  ns/op
MergeStoreBench.setIntLU     avgt   15   5032.690 ?  39.638  ns/op
MergeStoreBench.setIntLV     avgt   15    727.206 ?   9.519  ns/op
MergeStoreBench.setIntRB     avgt   15  10787.160 ? 187.849  ns/op
MergeStoreBench.setIntRBU    avgt   15  12464.270 ? 121.011  ns/op
MergeStoreBench.setIntRL     avgt   15   5250.418 ?  85.523  ns/op
MergeStoreBench.setIntRLU    avgt   15   7677.631 ?  80.561  ns/op
MergeStoreBench.setIntRU     avgt   15   1011.738 ?   8.791  ns/op
MergeStoreBench.setIntU      avgt   15    791.924 ?  14.517  ns/op
MergeStoreBench.setLongB     avgt   15  17833.690 ? 127.313  ns/op
MergeStoreBench.setLongBU    avgt   15  26447.098 ? 168.301  ns/op
MergeStoreBench.setLongBV    avgt   15   1071.447 ?   8.947  ns/op
MergeStoreBench.setLongL     avgt   15   3724.440 ?  35.119  ns/op
MergeStoreBench.setLongLU    avgt   15   5339.593 ?  45.358  ns/op
MergeStoreBench.setLongLV    avgt   15   1069.890 ?  16.179  ns/op
MergeStoreBench.setLongRB    avgt   15  18908.125 ? 262.767  ns/op
MergeStoreBench.setLongRBU   avgt   15  27622.437 ? 689.809  ns/op
MergeStoreBench.setLongRL    avgt   15   4338.138 ? 115.879  ns/op
MergeStoreBench.setLongRLU   avgt   15   4585.764 ? 102.305  ns/op
MergeStoreBench.setLongRU    avgt   15   1121.779 ?  36.325  ns/op
MergeStoreBench.setLongU     avgt   15   1075.340 ?  17.020  ns/op

2.3 Aliyun ecs.c8a (x64)

  • CPU AMD EPYCTM Genoa
Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getIntB      avgt   15  11976.614 ±  19.245  ns/op
MergeStoreBench.getIntBU     avgt   15   9054.386 ±  12.848  ns/op
MergeStoreBench.getIntBV     avgt   15    304.320 ±   0.412  ns/op
MergeStoreBench.getIntL      avgt   15  10755.574 ± 360.835  ns/op
MergeStoreBench.getIntLU     avgt   15   8889.977 ±  17.342  ns/op
MergeStoreBench.getIntLV     avgt   15   2229.743 ±   3.334  ns/op
MergeStoreBench.getIntRB     avgt   15  12356.363 ±  17.140  ns/op
MergeStoreBench.getIntRBU    avgt   15  11132.557 ±  21.023  ns/op
MergeStoreBench.getIntRL     avgt   15  11218.259 ±  15.377  ns/op
MergeStoreBench.getIntRLU    avgt   15   9356.533 ±  16.075  ns/op
MergeStoreBench.getIntRU     avgt   15   2511.578 ±   4.710  ns/op
MergeStoreBench.getIntU      avgt   15   2497.917 ±   3.230  ns/op
MergeStoreBench.getLongB     avgt   15  26910.266 ±  54.383  ns/op
MergeStoreBench.getLongBU    avgt   15  14217.696 ±  20.862  ns/op
MergeStoreBench.getLongBV    avgt   15    602.235 ±   0.678  ns/op
MergeStoreBench.getLongL     avgt   15  26889.931 ±  43.526  ns/op
MergeStoreBench.getLongLU    avgt   15  14547.062 ±  39.383  ns/op
MergeStoreBench.getLongLV    avgt   15   2228.017 ±   3.593  ns/op
MergeStoreBench.getLongRB    avgt   15  26901.754 ±  29.490  ns/op
MergeStoreBench.getLongRBU   avgt   15  14212.233 ±  17.917  ns/op
MergeStoreBench.getLongRL    avgt   15  26904.774 ±  53.650  ns/op
MergeStoreBench.getLongRLU   avgt   15  14531.530 ±  26.863  ns/op
MergeStoreBench.getLongRU    avgt   15   3066.434 ±   5.223  ns/op
MergeStoreBench.getLongU     avgt   15   3056.801 ±   4.346  ns/op
MergeStoreBench.putChars4    avgt   15  13433.247 ±  19.357  ns/op
MergeStoreBench.putChars4UB  avgt   15   4209.355 ±  10.661  ns/op
MergeStoreBench.putChars4UC  avgt   15   3388.720 ±   7.222  ns/op
MergeStoreBench.setIntB      avgt   15   8044.968 ±  10.066  ns/op
MergeStoreBench.setIntBU     avgt   15  10359.992 ±  41.852  ns/op
MergeStoreBench.setIntBV     avgt   15    598.579 ±   2.360  ns/op
MergeStoreBench.setIntL      avgt   15   2548.295 ±   5.228  ns/op
MergeStoreBench.setIntLU     avgt   15   6179.865 ±  70.419  ns/op
MergeStoreBench.setIntLV     avgt   15    603.562 ±   1.408  ns/op
MergeStoreBench.setIntRB     avgt   15   9743.462 ±  22.873  ns/op
MergeStoreBench.setIntRBU    avgt   15  10673.845 ±  21.662  ns/op
MergeStoreBench.setIntRL     avgt   15   6216.996 ±   7.323  ns/op
MergeStoreBench.setIntRLU    avgt   15   8407.392 ± 108.065  ns/op
MergeStoreBench.setIntRU     avgt   15    635.986 ±   1.449  ns/op
MergeStoreBench.setIntU      avgt   15    610.444 ±   1.139  ns/op
MergeStoreBench.setLongB     avgt   15  17226.045 ±  32.847  ns/op
MergeStoreBench.setLongBU    avgt   15  21476.791 ±  90.608  ns/op
MergeStoreBench.setLongBV    avgt   15   1184.335 ±   1.624  ns/op
MergeStoreBench.setLongL     avgt   15   3352.579 ±   4.849  ns/op
MergeStoreBench.setLongLU    avgt   15   6227.171 ±   9.784  ns/op
MergeStoreBench.setLongLV    avgt   15   1194.549 ±   2.399  ns/op
MergeStoreBench.setLongRB    avgt   15  17967.391 ±  41.726  ns/op
MergeStoreBench.setLongRBU   avgt   15  21428.757 ±  25.568  ns/op
MergeStoreBench.setLongRL    avgt   15   4035.273 ±   6.881  ns/op
MergeStoreBench.setLongRLU   avgt   15   4858.090 ±  16.189  ns/op
MergeStoreBench.setLongRU    avgt   15   1169.711 ±   2.061  ns/op
MergeStoreBench.setLongU     avgt   15   1196.299 ±   1.978  ns/op

2.4 Aliyun ecs.c8i (x64)

CPU CPU Intel® Xeon® Emerald

Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getIntB      avgt   15  10515.469 ±  21.610  ns/op
MergeStoreBench.getIntBU     avgt   15   9269.025 ±  16.736  ns/op
MergeStoreBench.getIntBV     avgt   15    255.475 ±   0.853  ns/op
MergeStoreBench.getIntL      avgt   15   9699.152 ±  57.865  ns/op
MergeStoreBench.getIntLU     avgt   15   8984.031 ±  13.596  ns/op
MergeStoreBench.getIntLV     avgt   15   2570.119 ±   1.924  ns/op
MergeStoreBench.getIntRB     avgt   15  11281.847 ±   3.579  ns/op
MergeStoreBench.getIntRBU    avgt   15  10323.475 ±  14.128  ns/op
MergeStoreBench.getIntRL     avgt   15  10566.386 ±   2.524  ns/op
MergeStoreBench.getIntRLU    avgt   15   9432.976 ±   2.876  ns/op
MergeStoreBench.getIntRU     avgt   15   2327.557 ±   0.471  ns/op
MergeStoreBench.getIntU      avgt   15   2311.914 ±   1.782  ns/op
MergeStoreBench.getLongB     avgt   15  21682.355 ±  30.503  ns/op
MergeStoreBench.getLongBU    avgt   15  14674.931 ±   3.452  ns/op
MergeStoreBench.getLongBV    avgt   15    652.253 ±   1.555  ns/op
MergeStoreBench.getLongL     avgt   15  21583.633 ±  28.439  ns/op
MergeStoreBench.getLongLU    avgt   15  14350.307 ±  31.842  ns/op
MergeStoreBench.getLongLV    avgt   15   2575.151 ±   0.376  ns/op
MergeStoreBench.getLongRB    avgt   15  21678.521 ±   5.962  ns/op
MergeStoreBench.getLongRBU   avgt   15  14678.208 ±  23.997  ns/op
MergeStoreBench.getLongRL    avgt   15  21576.705 ±   2.667  ns/op
MergeStoreBench.getLongRLU   avgt   15  14341.769 ±  22.908  ns/op
MergeStoreBench.getLongRU    avgt   15   2986.505 ±   0.574  ns/op
MergeStoreBench.getLongU     avgt   15   2940.918 ±   0.348  ns/op
MergeStoreBench.putChars4    avgt   15  10438.053 ±  15.578  ns/op
MergeStoreBench.putChars4UB  avgt   15   3015.499 ±   7.300  ns/op
MergeStoreBench.putChars4UC  avgt   15   5317.663 ±   3.992  ns/op
MergeStoreBench.setIntB      avgt   15   6885.979 ±  11.822  ns/op
MergeStoreBench.setIntBU     avgt   15  10131.264 ±  30.389  ns/op
MergeStoreBench.setIntBV     avgt   15    898.844 ±   3.806  ns/op
MergeStoreBench.setIntL      avgt   15   2885.903 ±   3.228  ns/op
MergeStoreBench.setIntLU     avgt   15   5282.482 ±  59.298  ns/op
MergeStoreBench.setIntLV     avgt   15    949.442 ±   2.543  ns/op
MergeStoreBench.setIntRB     avgt   15   8152.273 ±  11.990  ns/op
MergeStoreBench.setIntRBU    avgt   15  10604.720 ±  19.430  ns/op
MergeStoreBench.setIntRL     avgt   15   5989.979 ±   4.767  ns/op
MergeStoreBench.setIntRLU    avgt   15   7261.499 ± 120.307  ns/op
MergeStoreBench.setIntRU     avgt   15    960.782 ±   2.784  ns/op
MergeStoreBench.setIntU      avgt   15    989.716 ±   1.030  ns/op
MergeStoreBench.setLongB     avgt   15  15865.777 ±  32.450  ns/op
MergeStoreBench.setLongBU    avgt   15  22843.580 ±  48.434  ns/op
MergeStoreBench.setLongBV    avgt   15   1814.973 ±   7.079  ns/op
MergeStoreBench.setLongL     avgt   15   4346.312 ±   1.318  ns/op
MergeStoreBench.setLongLU    avgt   15   5399.475 ±  35.513  ns/op
MergeStoreBench.setLongLV    avgt   15   1903.106 ±  22.995  ns/op
MergeStoreBench.setLongRB    avgt   15  16980.234 ±  34.819  ns/op
MergeStoreBench.setLongRBU   avgt   15  24924.078 ±  49.285  ns/op
MergeStoreBench.setLongRL    avgt   15   4483.791 ±   6.976  ns/op
MergeStoreBench.setLongRLU   avgt   15   5004.085 ±   3.843  ns/op
MergeStoreBench.setLongRU    avgt   15   1818.725 ±  21.406  ns/op
MergeStoreBench.setLongU     avgt   15   1940.593 ±  21.824  ns/op

2.5 Aliyun ecs.c8y (aarch64)

  • CPU Aliyun Yitian 710
Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getIntB      avgt   15   7693.088 ±   3.311  ns/op
MergeStoreBench.getIntBU     avgt   15   6569.492 ±   4.363  ns/op
MergeStoreBench.getIntBV     avgt   15   1360.788 ±   0.185  ns/op
MergeStoreBench.getIntL      avgt   15   6869.948 ±   0.459  ns/op
MergeStoreBench.getIntLU     avgt   15   6059.390 ±  10.758  ns/op
MergeStoreBench.getIntLV     avgt   15   2753.969 ±   0.147  ns/op
MergeStoreBench.getIntRB     avgt   15   8176.169 ± 108.856  ns/op
MergeStoreBench.getIntRBU    avgt   15   7262.778 ±   2.157  ns/op
MergeStoreBench.getIntRL     avgt   15   7691.955 ±   2.307  ns/op
MergeStoreBench.getIntRLU    avgt   15   6687.164 ±  11.958  ns/op
MergeStoreBench.getIntRU     avgt   15   2816.706 ±   1.032  ns/op
MergeStoreBench.getIntU      avgt   15   2855.242 ±   0.395  ns/op
MergeStoreBench.getLongB     avgt   15  13808.276 ±   5.076  ns/op
MergeStoreBench.getLongBU    avgt   15  11786.525 ±   6.141  ns/op
MergeStoreBench.getLongBV    avgt   15   2792.010 ±   0.671  ns/op
MergeStoreBench.getLongL     avgt   15  13296.684 ±  17.836  ns/op
MergeStoreBench.getLongLU    avgt   15  11210.969 ±   5.916  ns/op
MergeStoreBench.getLongLV    avgt   15   2759.405 ±   0.240  ns/op
MergeStoreBench.getLongRB    avgt   15  13812.198 ±   2.658  ns/op
MergeStoreBench.getLongRBU   avgt   15  11786.747 ±   5.149  ns/op
MergeStoreBench.getLongRL    avgt   15  13300.198 ±  16.842  ns/op
MergeStoreBench.getLongRLU   avgt   15  11208.050 ±   9.084  ns/op
MergeStoreBench.getLongRU    avgt   15   2835.510 ±   0.462  ns/op
MergeStoreBench.getLongU     avgt   15   2864.473 ±   0.705  ns/op
MergeStoreBench.putChars4    avgt   15   8895.844 ±   6.508  ns/op
MergeStoreBench.putChars4UB  avgt   15   5495.596 ±   1.519  ns/op
MergeStoreBench.putChars4UC  avgt   15   5110.665 ±   6.025  ns/op
MergeStoreBench.setIntB      avgt   15   6062.162 ±   4.247  ns/op
MergeStoreBench.setIntBU     avgt   15   6665.214 ±  13.035  ns/op
MergeStoreBench.setIntBV     avgt   15   1362.756 ±   0.087  ns/op
MergeStoreBench.setIntL      avgt   15   2823.779 ±   0.791  ns/op
MergeStoreBench.setIntLU     avgt   15   2766.163 ±   0.179  ns/op
MergeStoreBench.setIntLV     avgt   15   5508.486 ±   1.444  ns/op
MergeStoreBench.setIntRB     avgt   15   7591.497 ±   6.000  ns/op
MergeStoreBench.setIntRBU    avgt   15   7748.780 ±   4.463  ns/op
MergeStoreBench.setIntRL     avgt   15   5517.300 ±   7.851  ns/op
MergeStoreBench.setIntRLU    avgt   15   5622.521 ±   1.091  ns/op
MergeStoreBench.setIntRU     avgt   15   5581.834 ±   1.102  ns/op
MergeStoreBench.setIntU      avgt   15   5463.442 ±   0.682  ns/op
MergeStoreBench.setLongB     avgt   15  13516.164 ±   5.466  ns/op
MergeStoreBench.setLongBU    avgt   15  13614.626 ±  20.629  ns/op
MergeStoreBench.setLongBV    avgt   15   2796.317 ±   0.953  ns/op
MergeStoreBench.setLongL     avgt   15   5549.128 ±  28.272  ns/op
MergeStoreBench.setLongLU    avgt   15   4130.981 ±   1.344  ns/op
MergeStoreBench.setLongLV    avgt   15   2785.515 ±   0.318  ns/op
MergeStoreBench.setLongRB    avgt   15  14287.192 ±  10.265  ns/op
MergeStoreBench.setLongRBU   avgt   15  14499.620 ±  10.026  ns/op
MergeStoreBench.setLongRL    avgt   15   6671.064 ±  21.088  ns/op
MergeStoreBench.setLongRLU   avgt   15   4831.917 ±  11.497  ns/op
MergeStoreBench.setLongRU    avgt   15   3197.165 ±   0.768  ns/op
MergeStoreBench.setLongU     avgt   15   2799.934 ±   1.945  ns/op

2.6 Orange Pi 5 Plus (aarch64)

Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getIntB      avgt   15  14310.070 ±  80.699  ns/op
MergeStoreBench.getIntBU     avgt   15  12523.180 ±  30.688  ns/op
MergeStoreBench.getIntBV     avgt   15   1846.502 ±   5.312  ns/op
MergeStoreBench.getIntL      avgt   15  12956.229 ±  24.863  ns/op
MergeStoreBench.getIntLU     avgt   15  11325.391 ±  38.825  ns/op
MergeStoreBench.getIntLV     avgt   15   3768.575 ±  20.082  ns/op
MergeStoreBench.getIntRB     avgt   15  15539.092 ±  40.245  ns/op
MergeStoreBench.getIntRBU    avgt   15  13862.743 ±  54.321  ns/op
MergeStoreBench.getIntRL     avgt   15  14186.647 ±  35.623  ns/op
MergeStoreBench.getIntRLU    avgt   15  12573.647 ±  23.523  ns/op
MergeStoreBench.getIntRU     avgt   15   4368.234 ±  23.460  ns/op
MergeStoreBench.getIntU      avgt   15   3700.248 ±  17.718  ns/op
MergeStoreBench.getLongB     avgt   15  26513.593 ±  84.012  ns/op
MergeStoreBench.getLongBU    avgt   15  22001.241 ±  47.761  ns/op
MergeStoreBench.getLongBV    avgt   15   4148.252 ±  15.058  ns/op
MergeStoreBench.getLongL     avgt   15  25282.815 ± 130.802  ns/op
MergeStoreBench.getLongLU    avgt   15  21206.815 ±  85.923  ns/op
MergeStoreBench.getLongLV    avgt   15   3777.233 ±   7.117  ns/op
MergeStoreBench.getLongRB    avgt   15  26485.481 ±  40.748  ns/op
MergeStoreBench.getLongRBU   avgt   15  21973.762 ±  32.260  ns/op
MergeStoreBench.getLongRL    avgt   15  25299.502 ±  59.614  ns/op
MergeStoreBench.getLongRLU   avgt   15  21178.624 ±  65.400  ns/op
MergeStoreBench.getLongRU    avgt   15   4388.503 ±  15.022  ns/op
MergeStoreBench.getLongU     avgt   15   3721.682 ±   3.202  ns/op
MergeStoreBench.putChars4    avgt   15  18617.889 ± 194.877  ns/op
MergeStoreBench.putChars4UB  avgt   15  11140.563 ±  35.977  ns/op
MergeStoreBench.putChars4UC  avgt   15  10913.407 ±  30.633  ns/op
MergeStoreBench.setIntB      avgt   15  11670.307 ±  30.119  ns/op
MergeStoreBench.setIntBU     avgt   15  13614.156 ±  79.641  ns/op
MergeStoreBench.setIntBV     avgt   15   1856.985 ±   3.735  ns/op
MergeStoreBench.setIntL      avgt   15   5094.994 ± 143.111  ns/op
MergeStoreBench.setIntLU     avgt   15   4653.661 ±  13.918  ns/op
MergeStoreBench.setIntLV     avgt   15   7364.007 ±  23.713  ns/op
MergeStoreBench.setIntRL     avgt   15   7745.408 ±  19.505  ns/op
MergeStoreBench.setIntRLU    avgt   15   8262.371 ±  17.381  ns/op
MergeStoreBench.setIntRU     avgt   15   7361.715 ±  15.911  ns/op
MergeStoreBench.setIntU      avgt   15   7360.358 ±  18.326  ns/op
MergeStoreBench.setLongB     avgt   15  29724.536 ± 111.106  ns/op
MergeStoreBench.setLongBU    avgt   15  28497.008 ± 141.615  ns/op
MergeStoreBench.setLongBV    avgt   15   5722.789 ±  32.736  ns/op
MergeStoreBench.setLongL     avgt   15  10547.782 ±  31.164  ns/op
MergeStoreBench.setLongLU    avgt   15   8291.086 ±  39.928  ns/op
MergeStoreBench.setLongLV    avgt   15   4614.304 ±  24.229  ns/op
MergeStoreBench.setLongRB    avgt   15  33607.418 ± 293.440  ns/op
MergeStoreBench.setLongRBU   avgt   15  30414.981 ±  74.164  ns/op
MergeStoreBench.setLongRL    avgt   15  13901.427 ±  79.116  ns/op
MergeStoreBench.setLongRLU   avgt   15   9751.634 ± 337.882  ns/op
MergeStoreBench.setLongRU    avgt   15   6305.701 ±  13.433  ns/op
MergeStoreBench.setLongU     avgt   15   5174.620 ±  27.848  ns/op

2.7 AWS ecs c5.xlarge (x64)

Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getIntB      avgt   15  13079.509 ±  31.297  ns/op
MergeStoreBench.getIntBU     avgt   15   9462.475 ±  21.237  ns/op
MergeStoreBench.getIntBV     avgt   15    427.081 ±  11.493  ns/op
MergeStoreBench.getIntL      avgt   15  12265.667 ± 155.913  ns/op
MergeStoreBench.getIntLU     avgt   15   9444.551 ±   6.117  ns/op
MergeStoreBench.getIntLV     avgt   15   2424.707 ±   3.046  ns/op
MergeStoreBench.getIntRB     avgt   15  13399.629 ±  21.861  ns/op
MergeStoreBench.getIntRBU    avgt   15  10146.871 ±  29.528  ns/op
MergeStoreBench.getIntRL     avgt   15  13079.355 ±   4.712  ns/op
MergeStoreBench.getIntRLU    avgt   15  10074.582 ±  21.369  ns/op
MergeStoreBench.getIntRU     avgt   15   2965.584 ±   8.777  ns/op
MergeStoreBench.getIntU      avgt   15   2725.438 ±   3.373  ns/op
MergeStoreBench.getLongB     avgt   15  26115.043 ±  45.099  ns/op
MergeStoreBench.getLongBU    avgt   15  17887.028 ±  31.958  ns/op
MergeStoreBench.getLongBV    avgt   15   1003.857 ±   3.478  ns/op
MergeStoreBench.getLongL     avgt   15  26121.420 ±  56.763  ns/op
MergeStoreBench.getLongLU    avgt   15  17838.494 ±  33.942  ns/op
MergeStoreBench.getLongLV    avgt   15   2422.744 ±   2.451  ns/op
MergeStoreBench.getLongRB    avgt   15  26099.347 ±   7.375  ns/op
MergeStoreBench.getLongRBU   avgt   15  17892.774 ±  36.187  ns/op
MergeStoreBench.getLongRL    avgt   15  26113.364 ±  42.866  ns/op
MergeStoreBench.getLongRLU   avgt   15  17828.021 ±   5.329  ns/op
MergeStoreBench.getLongRU    avgt   15   3848.432 ±   0.722  ns/op
MergeStoreBench.getLongU     avgt   15   3784.374 ±   6.472  ns/op
MergeStoreBench.putChars4    avgt   15   9816.198 ±  26.355  ns/op
MergeStoreBench.putChars4UB  avgt   15   4706.135 ±   4.254  ns/op
MergeStoreBench.putChars4UC  avgt   15  10203.688 ±  56.176  ns/op
MergeStoreBench.setIntB      avgt   15  12799.623 ±  28.007  ns/op
MergeStoreBench.setIntBU     avgt   15  15187.730 ±  40.084  ns/op
MergeStoreBench.setIntBV     avgt   15    889.277 ±  11.389  ns/op
MergeStoreBench.setIntL      avgt   15   2970.163 ±   5.265  ns/op
MergeStoreBench.setIntLU     avgt   15   6769.524 ±  18.502  ns/op
MergeStoreBench.setIntLV     avgt   15    945.737 ±   2.444  ns/op
MergeStoreBench.setIntRB     avgt   15  14046.973 ±  32.796  ns/op
MergeStoreBench.setIntRBU    avgt   15  16527.209 ±  82.575  ns/op
MergeStoreBench.setIntRL     avgt   15   7020.496 ± 179.054  ns/op
MergeStoreBench.setIntRLU    avgt   15  10178.358 ±  19.539  ns/op
MergeStoreBench.setIntRU     avgt   15   1343.150 ±   6.356  ns/op
MergeStoreBench.setIntU      avgt   15   1037.694 ±   2.379  ns/op
MergeStoreBench.setLongB     avgt   15  22693.643 ±  42.961  ns/op
MergeStoreBench.setLongBU    avgt   15  35717.196 ± 104.769  ns/op
MergeStoreBench.setLongBV    avgt   15   1753.671 ±  26.216  ns/op
MergeStoreBench.setLongL     avgt   15   4150.631 ±   1.421  ns/op
MergeStoreBench.setLongLU    avgt   15   7141.568 ±  27.922  ns/op
MergeStoreBench.setLongLV    avgt   15   1678.193 ±   5.627  ns/op
MergeStoreBench.setLongRB    avgt   15  24545.267 ±   6.587  ns/op
MergeStoreBench.setLongRBU   avgt   15  36050.753 ±  65.961  ns/op
MergeStoreBench.setLongRL    avgt   15   5557.854 ±   8.769  ns/op
MergeStoreBench.setLongRLU   avgt   15   6155.434 ±  50.697  ns/op
MergeStoreBench.setLongRU    avgt   15   1751.805 ±  26.192  ns/op
MergeStoreBench.setLongU     avgt   15   1701.889 ±   9.799  ns/op

2.8 AWS c7g.xlarge (aarch64)

Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getIntB      avgt   15   8043.459 ±   4.801  ns/op
MergeStoreBench.getIntBU     avgt   15   7056.463 ±   3.285  ns/op
MergeStoreBench.getIntBV     avgt   15    780.570 ±   0.033  ns/op
MergeStoreBench.getIntL      avgt   15   7666.043 ±  10.795  ns/op
MergeStoreBench.getIntLU     avgt   15   6379.222 ±  13.369  ns/op
MergeStoreBench.getIntLV     avgt   15   3179.702 ±   0.235  ns/op
MergeStoreBench.getIntRB     avgt   15   8892.801 ±   7.461  ns/op
MergeStoreBench.getIntRBU    avgt   15   7822.666 ±   0.944  ns/op
MergeStoreBench.getIntRL     avgt   15   8385.029 ± 255.203  ns/op
MergeStoreBench.getIntRLU    avgt   15   6995.686 ±   1.382  ns/op
MergeStoreBench.getIntRU     avgt   15   2361.462 ±   0.125  ns/op
MergeStoreBench.getIntU      avgt   15   2232.425 ±   0.121  ns/op
MergeStoreBench.getLongB     avgt   15  15212.747 ±  72.866  ns/op
MergeStoreBench.getLongBU    avgt   15  12521.436 ±  10.604  ns/op
MergeStoreBench.getLongBV    avgt   15   1570.044 ±   0.614  ns/op
MergeStoreBench.getLongL     avgt   15  14723.614 ±   2.207  ns/op
MergeStoreBench.getLongLU    avgt   15  12038.660 ±  17.515  ns/op
MergeStoreBench.getLongLV    avgt   15   3180.912 ±   0.213  ns/op
MergeStoreBench.getLongRB    avgt   15  15168.386 ±   3.612  ns/op
MergeStoreBench.getLongRBU   avgt   15  12513.044 ±   6.066  ns/op
MergeStoreBench.getLongRL    avgt   15  14725.218 ±   2.830  ns/op
MergeStoreBench.getLongRLU   avgt   15  12030.288 ±  17.871  ns/op
MergeStoreBench.getLongRU    avgt   15   3198.287 ±   0.195  ns/op
MergeStoreBench.getLongU     avgt   15   3187.308 ±   0.211  ns/op
MergeStoreBench.putChars4    avgt   15  10401.393 ±  12.031  ns/op
MergeStoreBench.putChars4UB  avgt   15   3677.934 ±   2.670  ns/op
MergeStoreBench.putChars4UC  avgt   15   5744.121 ±  82.141  ns/op
MergeStoreBench.setIntB      avgt   15   7043.439 ±   2.165  ns/op
MergeStoreBench.setIntBU     avgt   15   7981.350 ±   0.716  ns/op
MergeStoreBench.setIntBV     avgt   15    784.619 ±   0.029  ns/op
MergeStoreBench.setIntL      avgt   15   3189.311 ±   0.710  ns/op
MergeStoreBench.setIntLU     avgt   15   3272.439 ±   3.194  ns/op
MergeStoreBench.setIntLV     avgt   15   1565.925 ±   0.146  ns/op
MergeStoreBench.setIntRB     avgt   15   8625.077 ±   1.519  ns/op
MergeStoreBench.setIntRBU    avgt   15   8105.421 ±  27.598  ns/op
MergeStoreBench.setIntRL     avgt   15   6300.870 ±   0.364  ns/op
MergeStoreBench.setIntRLU    avgt   15   6336.852 ±   1.972  ns/op
MergeStoreBench.setIntRU     avgt   15   1716.681 ±  28.187  ns/op
MergeStoreBench.setIntU      avgt   15   1567.628 ±   0.316  ns/op
MergeStoreBench.setLongB     avgt   15  14894.383 ±  12.716  ns/op
MergeStoreBench.setLongBU    avgt   15  16058.865 ±  17.127  ns/op
MergeStoreBench.setLongBV    avgt   15   1575.499 ±   0.678  ns/op
MergeStoreBench.setLongL     avgt   15   6856.932 ±  25.268  ns/op
MergeStoreBench.setLongLU    avgt   15   3385.389 ±   3.971  ns/op
MergeStoreBench.setLongLV    avgt   15   1569.862 ±   0.616  ns/op
MergeStoreBench.setLongRB    avgt   15  15452.243 ±   6.443  ns/op
MergeStoreBench.setLongRBU   avgt   15  15453.597 ±  10.201  ns/op
MergeStoreBench.setLongRL    avgt   15   7176.426 ±   4.363  ns/op
MergeStoreBench.setLongRLU   avgt   15   3211.004 ±   1.041  ns/op
MergeStoreBench.setLongRU    avgt   15   1575.034 ±   0.267  ns/op
MergeStoreBench.setLongU     avgt   15   1569.837 ±   0.461  ns/op

@wenshao wenshao changed the title Add MergeStore JMH benchmarks 8334342: Add MergeStore JMH benchmarks Jun 16, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 16, 2024
@mlbridge
Copy link

mlbridge bot commented Jun 16, 2024

@eme64
Copy link
Contributor

eme64 commented Jun 17, 2024

A few extra comments:
There is already a MergeStore benchmark. I would prefer if you put yours next to it, unless if you have a good reason.

About your list:

getIntB    
getIntBU   
getIntL    
getIntLU   
getIntRB   
getIntRBU  
getIntRL   
getIntRLU  
getLongB   
getLongBU  
getLongL   
getLongLU  
getLongRB  
getLongRBU 
getLongRL  
getLongRLU
-> Obviously a "MergeStore" optimization does not work for loads. But if it is important, then maybe we could generalize the optimizations from stores to loads.

putChars4UC
-> Does the putChars4UC get inlined? Because here you are storing 4 variables, this pattern is not handled by MergeStore.

setIntB
-> You say that here MergeStore does not work. That is because the indices are increasing, but the shifts decreasing. So that does not work on little-endian machines (most architectures), but I would expect it to work on big-endian machines with https://github.com/openjdk/jdk/pull/19218.
   
setIntBU
-> order seems messed up

setIntRB    
setIntRBU   
setLongB
setLongBU  
setLongRB  
setLongRBU
-> I leave the rest for you to investigate.

Comment on lines 651 to 656
static void setIntB(byte[] array, int offset, int value) {
array[offset ] = (byte) (value >> 24);
array[offset + 1] = (byte) (value >> 16);
array[offset + 2] = (byte) (value >> 8);
array[offset + 3] = (byte) (value );
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say that here MergeStore does not work. That is because the indices are increasing, but the shifts decreasing. So that does not work on little-endian machines (most architectures), but I would expect it to work on big-endian machines with #19218.

Copy link
Contributor Author

@wenshao wenshao Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big endian is often used in network data transmission scenarios, and it is common to process big endian data on a little endian machine. In this case, can it be optimized to Integer.reverseBytes & putIntLittleEndian on a LittleEndian machine? setIntB -> setIntRL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already suggested that here:
#19218 (review)
Feel free to file an RFE. Maybe someone wants to work on it. I think it would not be that hard to make it work given all the code that is already there now. And it would be helpful in for both big/little endian to be able to do both orders.

@wenshao
Copy link
Contributor Author

wenshao commented Jun 18, 2024

I re-ran the performance test based on WebRevs 04: Full - Incremental (4c9b9418) .

1. Cases MergeStore does not work

@eme64
I found putChars4BV and putChars4LV two cases MergeStore didn't work, if support can be enhanced, it would be useful for people using VarHandle.

putChars4BV
putChars4LV

I also found that the performance of the case using VarHandle is particularly good. Why? For example:

setIntBV
setIntLV
setLongBV
setLongLV

2. Performance numbers

The names of these cases have the following B/L/V/U suffixes, which are:

B BigEndian
L LittleEndian
V VarHandle
U Unsafe
R reverseBytes
C Unsafe.getChar & putChar
S Unsafe.getShort & putShort

2.1 MacBook M1 Pro (aarch64)

Benchmark                    Mode  Cnt      Score    Error  Units
MergeStoreBench.getCharB     avgt   15   5340.200 ?  7.038  ns/op
MergeStoreBench.getCharBU    avgt   15   5482.163 ?  7.922  ns/op
MergeStoreBench.getCharBV    avgt   15   5074.165 ?  6.759  ns/op
MergeStoreBench.getCharC     avgt   15   5051.763 ?  6.552  ns/op
MergeStoreBench.getCharL     avgt   15   5374.464 ?  9.783  ns/op
MergeStoreBench.getCharLU    avgt   15   5487.532 ?  6.368  ns/op
MergeStoreBench.getCharLV    avgt   15   5071.263 ?  9.717  ns/op
MergeStoreBench.getIntB      avgt   15   6277.984 ?  6.284  ns/op
MergeStoreBench.getIntBU     avgt   15   5232.984 ? 10.384  ns/op
MergeStoreBench.getIntBV     avgt   15   1206.264 ?  1.193  ns/op
MergeStoreBench.getIntL      avgt   15   6172.779 ?  1.962  ns/op
MergeStoreBench.getIntLU     avgt   15   5157.317 ? 16.077  ns/op
MergeStoreBench.getIntLV     avgt   15   2558.110 ?  3.402  ns/op
MergeStoreBench.getIntRB     avgt   15   6889.916 ? 36.955  ns/op
MergeStoreBench.getIntRBU    avgt   15   5769.950 ? 11.499  ns/op
MergeStoreBench.getIntRL     avgt   15   6625.605 ? 10.662  ns/op
MergeStoreBench.getIntRLU    avgt   15   5746.742 ? 11.945  ns/op
MergeStoreBench.getIntRU     avgt   15   2544.586 ?  2.769  ns/op
MergeStoreBench.getIntU      avgt   15   2541.119 ?  3.252  ns/op
MergeStoreBench.getLongB     avgt   15  12098.129 ? 31.451  ns/op
MergeStoreBench.getLongBU    avgt   15   9760.621 ? 16.427  ns/op
MergeStoreBench.getLongBV    avgt   15   2593.635 ?  4.698  ns/op
MergeStoreBench.getLongL     avgt   15  12031.065 ? 19.820  ns/op
MergeStoreBench.getLongLU    avgt   15   9653.938 ? 18.372  ns/op
MergeStoreBench.getLongLV    avgt   15   2557.521 ?  3.338  ns/op
MergeStoreBench.getLongRB    avgt   15  12092.061 ? 18.026  ns/op
MergeStoreBench.getLongRBU   avgt   15   9763.489 ? 17.347  ns/op
MergeStoreBench.getLongRL    avgt   15  12027.686 ? 17.472  ns/op
MergeStoreBench.getLongRLU   avgt   15   9649.433 ?  8.384  ns/op
MergeStoreBench.getLongRU    avgt   15   2546.239 ?  2.088  ns/op
MergeStoreBench.getLongU     avgt   15   2539.762 ?  1.439  ns/op
MergeStoreBench.putChars4B   avgt   15   8487.381 ? 23.170  ns/op
MergeStoreBench.putChars4BU  avgt   15   3830.198 ?  7.083  ns/op
MergeStoreBench.putChars4BV  avgt   15   5154.819 ? 10.348  ns/op
MergeStoreBench.putChars4C   avgt   15   5162.766 ? 15.041  ns/op
MergeStoreBench.putChars4L   avgt   15   8381.231 ? 20.135  ns/op
MergeStoreBench.putChars4LU  avgt   15   3827.784 ?  3.163  ns/op
MergeStoreBench.putChars4LV  avgt   15   5151.508 ?  4.907  ns/op
MergeStoreBench.putChars4S   avgt   15   5152.123 ?  7.407  ns/op
MergeStoreBench.setCharBS    avgt   15   5317.319 ? 28.445  ns/op
MergeStoreBench.setCharBV    avgt   15   5175.400 ?  7.110  ns/op
MergeStoreBench.setCharC     avgt   15   5085.752 ?  6.222  ns/op
MergeStoreBench.setCharLS    avgt   15   5294.766 ?  9.742  ns/op
MergeStoreBench.setCharLV    avgt   15   5108.269 ?  6.692  ns/op
MergeStoreBench.setIntB      avgt   15   5095.236 ?  2.838  ns/op
MergeStoreBench.setIntBU     avgt   15   5097.007 ?  4.249  ns/op
MergeStoreBench.setIntBV     avgt   15   1224.506 ?  0.976  ns/op
MergeStoreBench.setIntL      avgt   15   2764.388 ?  2.400  ns/op
MergeStoreBench.setIntLU     avgt   15   2573.624 ?  6.677  ns/op
MergeStoreBench.setIntLV     avgt   15   5105.804 ? 11.551  ns/op
MergeStoreBench.setIntRB     avgt   15   5348.785 ?  4.974  ns/op
MergeStoreBench.setIntRBU    avgt   15   5422.049 ? 31.009  ns/op
MergeStoreBench.setIntRL     avgt   15   5293.414 ?  8.204  ns/op
MergeStoreBench.setIntRLU    avgt   15   5126.889 ?  7.435  ns/op
MergeStoreBench.setIntRU     avgt   15   5097.927 ?  3.588  ns/op
MergeStoreBench.setIntU      avgt   15   5087.192 ? 11.806  ns/op
MergeStoreBench.setLongB     avgt   15  10249.037 ? 19.538  ns/op
MergeStoreBench.setLongBU    avgt   15  10238.910 ? 11.998  ns/op
MergeStoreBench.setLongBV    avgt   15   2663.647 ?  4.147  ns/op
MergeStoreBench.setLongL     avgt   15   6304.458 ?  4.588  ns/op
MergeStoreBench.setLongLU    avgt   15   2921.575 ? 10.649  ns/op
MergeStoreBench.setLongLV    avgt   15   2663.323 ?  1.188  ns/op
MergeStoreBench.setLongRB    avgt   15  10255.875 ? 19.754  ns/op
MergeStoreBench.setLongRBU   avgt   15  10227.856 ?  9.970  ns/op
MergeStoreBench.setLongRL    avgt   15   6641.173 ?  3.836  ns/op
MergeStoreBench.setLongRLU   avgt   15   3241.057 ? 22.250  ns/op
MergeStoreBench.setLongRU    avgt   15   2608.399 ?  2.243  ns/op
MergeStoreBench.setLongU     avgt   15   2594.970 ?  3.490  ns/op

2.2 Aliyun ecs.c8a.xlarge (x64)

  • CPU AMD EPYCTM Genoa
Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getCharB     avgt   15   5969.667 ±  75.660  ns/op
MergeStoreBench.getCharBU    avgt   15   4576.650 ±  27.489  ns/op
MergeStoreBench.getCharBV    avgt   15   3085.061 ±   3.206  ns/op
MergeStoreBench.getCharC     avgt   15   2237.624 ±   1.383  ns/op
MergeStoreBench.getCharL     avgt   15   6044.112 ±   8.960  ns/op
MergeStoreBench.getCharLU    avgt   15   4538.252 ±   3.747  ns/op
MergeStoreBench.getCharLV    avgt   15   2221.833 ±   0.727  ns/op
MergeStoreBench.getIntB      avgt   15  11983.238 ±  74.190  ns/op
MergeStoreBench.getIntBU     avgt   15   9039.309 ±   6.332  ns/op
MergeStoreBench.getIntBV     avgt   15    303.874 ±   0.305  ns/op
MergeStoreBench.getIntL      avgt   15  10521.992 ±  15.238  ns/op
MergeStoreBench.getIntLU     avgt   15   8867.106 ±   7.014  ns/op
MergeStoreBench.getIntLV     avgt   15   2226.223 ±   0.887  ns/op
MergeStoreBench.getIntRB     avgt   15  12332.136 ±  19.948  ns/op
MergeStoreBench.getIntRBU    avgt   15  11114.256 ±   8.652  ns/op
MergeStoreBench.getIntRL     avgt   15  11206.728 ±  15.291  ns/op
MergeStoreBench.getIntRLU    avgt   15   9349.279 ±   7.379  ns/op
MergeStoreBench.getIntRU     avgt   15   2507.213 ±   1.222  ns/op
MergeStoreBench.getIntU      avgt   15   2495.432 ±   1.278  ns/op
MergeStoreBench.getLongB     avgt   15  26832.797 ±  19.316  ns/op
MergeStoreBench.getLongBU    avgt   15  13996.454 ±  17.628  ns/op
MergeStoreBench.getLongBV    avgt   15    605.548 ±   0.538  ns/op
MergeStoreBench.getLongL     avgt   15  26859.909 ±  31.234  ns/op
MergeStoreBench.getLongLU    avgt   15  14519.709 ±  23.482  ns/op
MergeStoreBench.getLongLV    avgt   15   2227.782 ±   0.535  ns/op
MergeStoreBench.getLongRB    avgt   15  26846.549 ±  17.321  ns/op
MergeStoreBench.getLongRBU   avgt   15  13994.948 ±  14.752  ns/op
MergeStoreBench.getLongRL    avgt   15  26838.819 ±  14.425  ns/op
MergeStoreBench.getLongRLU   avgt   15  14547.807 ±  73.859  ns/op
MergeStoreBench.getLongRU    avgt   15   3061.373 ±   1.690  ns/op
MergeStoreBench.getLongU     avgt   15   3049.441 ±   1.162  ns/op
MergeStoreBench.putChars4B   avgt   15  13411.014 ±   4.491  ns/op
MergeStoreBench.putChars4BU  avgt   15   4206.040 ±   4.317  ns/op
MergeStoreBench.putChars4BV  avgt   15   7948.154 ± 904.918  ns/op
MergeStoreBench.putChars4C   avgt   15   5316.859 ±   3.066  ns/op
MergeStoreBench.putChars4L   avgt   15  13419.757 ±  11.175  ns/op
MergeStoreBench.putChars4LU  avgt   15   4205.094 ±   5.079  ns/op
MergeStoreBench.putChars4LV  avgt   15   6734.543 ±   6.452  ns/op
MergeStoreBench.putChars4S   avgt   15   5323.487 ±  10.605  ns/op
MergeStoreBench.setCharBS    avgt   15   9225.082 ±  11.461  ns/op
MergeStoreBench.setCharBV    avgt   15   5242.360 ±  12.546  ns/op
MergeStoreBench.setCharC     avgt   15   4497.345 ±   7.426  ns/op
MergeStoreBench.setCharLS    avgt   15   8991.865 ±   7.281  ns/op
MergeStoreBench.setCharLV    avgt   15   2535.475 ±   4.230  ns/op
MergeStoreBench.setIntB      avgt   15   8036.698 ±   6.763  ns/op
MergeStoreBench.setIntBU     avgt   15  10332.333 ±  10.071  ns/op
MergeStoreBench.setIntBV     avgt   15    586.392 ±   1.024  ns/op
MergeStoreBench.setIntL      avgt   15   2541.327 ±   4.538  ns/op
MergeStoreBench.setIntLU     avgt   15   6122.574 ±  46.593  ns/op
MergeStoreBench.setIntLV     avgt   15    597.930 ±   0.672  ns/op
MergeStoreBench.setIntRB     avgt   15   9740.301 ±   3.367  ns/op
MergeStoreBench.setIntRBU    avgt   15  10648.285 ±  29.338  ns/op
MergeStoreBench.setIntRL     avgt   15   6227.445 ±  15.378  ns/op
MergeStoreBench.setIntRLU    avgt   15   8409.781 ±  61.847  ns/op
MergeStoreBench.setIntRU     avgt   15    631.337 ±   6.930  ns/op
MergeStoreBench.setIntU      avgt   15    604.432 ±   0.682  ns/op
MergeStoreBench.setLongB     avgt   15  17184.183 ±  11.490  ns/op
MergeStoreBench.setLongBU    avgt   15  21377.695 ±  51.384  ns/op
MergeStoreBench.setLongBV    avgt   15   1191.037 ±  10.983  ns/op
MergeStoreBench.setLongL     avgt   15   3342.476 ±   4.704  ns/op
MergeStoreBench.setLongLU    avgt   15   6194.791 ±  13.241  ns/op
MergeStoreBench.setLongLV    avgt   15   1194.042 ±   2.943  ns/op
MergeStoreBench.setLongRB    avgt   15  17946.742 ±  26.888  ns/op
MergeStoreBench.setLongRBU   avgt   15  21342.899 ±  22.937  ns/op
MergeStoreBench.setLongRL    avgt   15   4034.050 ±   3.792  ns/op
MergeStoreBench.setLongRLU   avgt   15   4825.627 ±  11.409  ns/op
MergeStoreBench.setLongRU    avgt   15   1170.252 ±   1.582  ns/op
MergeStoreBench.setLongU     avgt   15   1192.220 ±   1.060  ns/op

2.3 Aliyun ecs.c8i.xlarge (x64)

  • CPU CPU Intel® Xeon® Emerald
Benchmark                    Mode  Cnt      Score     Error  Units
MergeStoreBench.getCharB     avgt   15   5374.604 ±  11.001  ns/op
MergeStoreBench.getCharBU    avgt   15   4760.386 ±  20.612  ns/op
MergeStoreBench.getCharBV    avgt   15   3068.661 ±   2.712  ns/op
MergeStoreBench.getCharC     avgt   15   2591.548 ±   0.428  ns/op
MergeStoreBench.getCharL     avgt   15   5224.986 ±   3.388  ns/op
MergeStoreBench.getCharLU    avgt   15   4781.157 ±  19.001  ns/op
MergeStoreBench.getCharLV    avgt   15   2577.009 ±   1.374  ns/op
MergeStoreBench.getIntB      avgt   15  10512.241 ±  17.214  ns/op
MergeStoreBench.getIntBU     avgt   15   9271.460 ±  17.628  ns/op
MergeStoreBench.getIntBV     avgt   15    255.186 ±   0.731  ns/op
MergeStoreBench.getIntL      avgt   15   9728.629 ±   2.364  ns/op
MergeStoreBench.getIntLU     avgt   15   8983.810 ±   2.463  ns/op
MergeStoreBench.getIntLV     avgt   15   2569.886 ±   1.389  ns/op
MergeStoreBench.getIntRB     avgt   15  11285.198 ±  15.566  ns/op
MergeStoreBench.getIntRBU    avgt   15  10321.709 ±   4.604  ns/op
MergeStoreBench.getIntRL     avgt   15  10567.777 ±   3.931  ns/op
MergeStoreBench.getIntRLU    avgt   15   9436.647 ±  16.046  ns/op
MergeStoreBench.getIntRU     avgt   15   2327.805 ±   0.495  ns/op
MergeStoreBench.getIntU      avgt   15   2310.299 ±   2.477  ns/op
MergeStoreBench.getLongB     avgt   15  21698.862 ±  58.286  ns/op
MergeStoreBench.getLongBU    avgt   15  14682.074 ±  22.913  ns/op
MergeStoreBench.getLongBV    avgt   15    649.422 ±   2.738  ns/op
MergeStoreBench.getLongL     avgt   15  21584.034 ±  29.685  ns/op
MergeStoreBench.getLongLU    avgt   15  14346.370 ±   5.548  ns/op
MergeStoreBench.getLongLV    avgt   15   2574.877 ±   0.748  ns/op
MergeStoreBench.getLongRB    avgt   15  21689.446 ±  31.897  ns/op
MergeStoreBench.getLongRBU   avgt   15  14678.181 ±   3.447  ns/op
MergeStoreBench.getLongRL    avgt   15  21578.598 ±   4.353  ns/op
MergeStoreBench.getLongRLU   avgt   15  14350.201 ±  37.668  ns/op
MergeStoreBench.getLongRU    avgt   15   2988.364 ±   3.983  ns/op
MergeStoreBench.getLongU     avgt   15   2941.190 ±   0.582  ns/op
MergeStoreBench.putChars4B   avgt   15  10434.718 ±   3.309  ns/op
MergeStoreBench.putChars4BU  avgt   15   3008.607 ±   1.378  ns/op
MergeStoreBench.putChars4BV  avgt   15   7151.913 ± 483.572  ns/op
MergeStoreBench.putChars4C   avgt   15   6489.426 ±   1.369  ns/op
MergeStoreBench.putChars4L   avgt   15  10436.577 ±   5.568  ns/op
MergeStoreBench.putChars4LU  avgt   15   2837.432 ±   0.697  ns/op
MergeStoreBench.putChars4LV  avgt   15   7024.161 ±   9.887  ns/op
MergeStoreBench.putChars4S   avgt   15   6495.194 ±  12.316  ns/op
MergeStoreBench.setCharBS    avgt   15   8865.676 ±   6.476  ns/op
MergeStoreBench.setCharBV    avgt   15   5002.613 ±  20.300  ns/op
MergeStoreBench.setCharC     avgt   15   3936.314 ±   7.415  ns/op
MergeStoreBench.setCharLS    avgt   15   6989.120 ±  23.404  ns/op
MergeStoreBench.setCharLV    avgt   15   2589.797 ±   2.805  ns/op
MergeStoreBench.setIntB      avgt   15   6891.353 ±  13.239  ns/op
MergeStoreBench.setIntBU     avgt   15  10188.827 ±  21.409  ns/op
MergeStoreBench.setIntBV     avgt   15    899.335 ±   2.777  ns/op
MergeStoreBench.setIntL      avgt   15   2889.929 ±   6.582  ns/op
MergeStoreBench.setIntLU     avgt   15   5314.714 ±   5.170  ns/op
MergeStoreBench.setIntLV     avgt   15    945.432 ±   1.255  ns/op
MergeStoreBench.setIntRB     avgt   15   8159.294 ±  16.214  ns/op
MergeStoreBench.setIntRBU    avgt   15  10625.120 ±  12.809  ns/op
MergeStoreBench.setIntRL     avgt   15   6035.911 ±  47.780  ns/op
MergeStoreBench.setIntRLU    avgt   15   7148.487 ±  73.927  ns/op
MergeStoreBench.setIntRU     avgt   15    969.966 ±   6.127  ns/op
MergeStoreBench.setIntU      avgt   15    988.272 ±   2.214  ns/op
MergeStoreBench.setLongB     avgt   15  15857.394 ±   9.621  ns/op
MergeStoreBench.setLongBU    avgt   15  22955.799 ±   6.266  ns/op
MergeStoreBench.setLongBV    avgt   15   1831.898 ±   5.519  ns/op
MergeStoreBench.setLongL     avgt   15   4344.954 ±   4.273  ns/op
MergeStoreBench.setLongLU    avgt   15   5452.006 ±   9.333  ns/op
MergeStoreBench.setLongLV    avgt   15   1910.294 ±  22.688  ns/op
MergeStoreBench.setLongRB    avgt   15  16990.616 ±  59.974  ns/op
MergeStoreBench.setLongRBU   avgt   15  24951.367 ±  47.760  ns/op
MergeStoreBench.setLongRL    avgt   15   4484.135 ±   5.756  ns/op
MergeStoreBench.setLongRLU   avgt   15   4891.413 ±  26.743  ns/op
MergeStoreBench.setLongRU    avgt   15   1820.416 ±  11.285  ns/op
MergeStoreBench.setLongU     avgt   15   1932.694 ±  28.488  ns/op

MergeStoreBench.txt

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 16, 2024

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@liach
Copy link
Member

liach commented Jul 24, 2024

Should we leave this benchmark in vm/compiler instead of java/lang?

@eme64
Copy link
Contributor

eme64 commented Jul 24, 2024

@liach we should move it to vm/compiler, yes!

@eme64
Copy link
Contributor

eme64 commented Jul 24, 2024

@wenshao what is the state of this PR?

@wenshao
Copy link
Contributor Author

wenshao commented Jul 24, 2024

It has been moved to vm/compiler. Can it be approved?

@wenshao
Copy link
Contributor Author

wenshao commented Jul 24, 2024

Here are the performance numbers running on the new MacBook M1 Pro,

  • Test scenarios with significant performance improvements
Benchmark                    Mode  Cnt  Score-Old    Score-New    Units
MergeStoreBench.putChars4BU  avgt   15  10266.123    3830.198 *   ns/op
MergeStoreBench.putChars4LU  avgt   15  10266.238    3827.784 *   ns/op
MergeStoreBench.setIntLU     avgt   15   5103.562    2573.624 *   ns/op
MergeStoreBench.setLongLU    avgt   15  10304.012    2921.575 *   ns/op
MergeStoreBench.setLongRLU   avgt   15  10263.975    3241.057 *   ns/op
Benchmark                    Mode  Cnt  Score-Old    Score-New    Units
MergeStoreBench.getCharB     avgt   15   5341.787    5340.200     ns/op
MergeStoreBench.getCharBU    avgt   15   5477.363    5482.163     ns/op
MergeStoreBench.getCharBV    avgt   15   5163.099    5074.165     ns/op
MergeStoreBench.getCharC     avgt   15   5068.708    5051.763     ns/op
MergeStoreBench.getCharL     avgt   15   5379.821    5374.464     ns/op
MergeStoreBench.getCharLU    avgt   15   5477.268    5487.532     ns/op
MergeStoreBench.getCharLV    avgt   15   5079.045    5071.263     ns/op
MergeStoreBench.getIntB      avgt   15   6276.548    6277.984     ns/op
MergeStoreBench.getIntBU     avgt   15   5229.813    5232.984     ns/op
MergeStoreBench.getIntBV     avgt   15   1207.868    1206.264     ns/op
MergeStoreBench.getIntL      avgt   15   6182.150    6172.779     ns/op
MergeStoreBench.getIntLU     avgt   15   5164.260    5157.317     ns/op
MergeStoreBench.getIntLV     avgt   15   2555.443    2558.110     ns/op
MergeStoreBench.getIntRB     avgt   15   6879.188    6889.916     ns/op
MergeStoreBench.getIntRBU    avgt   15   5771.857    5769.950     ns/op
MergeStoreBench.getIntRL     avgt   15   6625.754    6625.605     ns/op
MergeStoreBench.getIntRLU    avgt   15   5746.554    5746.742     ns/op
MergeStoreBench.getIntRU     avgt   15   2547.449    2544.586     ns/op
MergeStoreBench.getIntU      avgt   15   2543.552    2541.119     ns/op
MergeStoreBench.getLongB     avgt   15  12099.002   12098.129     ns/op
MergeStoreBench.getLongBU    avgt   15   9771.893    9760.621     ns/op
MergeStoreBench.getLongBV    avgt   15   2593.835    2593.635     ns/op
MergeStoreBench.getLongL     avgt   15  12045.235   12031.065     ns/op
MergeStoreBench.getLongLU    avgt   15   9659.585    9653.938     ns/op
MergeStoreBench.getLongLV    avgt   15   2561.089    2557.521     ns/op
MergeStoreBench.getLongRB    avgt   15  12095.060   12092.061     ns/op
MergeStoreBench.getLongRBU   avgt   15   9767.943    9763.489     ns/op
MergeStoreBench.getLongRL    avgt   15  12037.935   12027.686     ns/op
MergeStoreBench.getLongRLU   avgt   15   9655.918    9649.433     ns/op
MergeStoreBench.getLongRU    avgt   15   2551.109    2546.239     ns/op
MergeStoreBench.getLongU     avgt   15   2543.732    2539.762     ns/op
MergeStoreBench.putChars4B   avgt   15   8499.750    8487.381     ns/op
MergeStoreBench.putChars4BU  avgt   15  10266.123    3830.198 *   ns/op
MergeStoreBench.putChars4BV  avgt   15   5153.418    5154.819     ns/op
MergeStoreBench.putChars4C   avgt   15   5141.336    5162.766     ns/op
MergeStoreBench.putChars4L   avgt   15   8382.747    8381.231     ns/op
MergeStoreBench.putChars4LU  avgt   15  10266.238    3827.784 *   ns/op
MergeStoreBench.putChars4LV  avgt   15   5150.613    5151.508     ns/op
MergeStoreBench.putChars4S   avgt   15   5144.843    5152.123     ns/op
MergeStoreBench.setCharBS    avgt   15   5318.051    5317.319     ns/op
MergeStoreBench.setCharBV    avgt   15   5187.295    5175.400     ns/op
MergeStoreBench.setCharC     avgt   15   5093.774    5085.752     ns/op
MergeStoreBench.setCharLS    avgt   15   5301.267    5294.766     ns/op
MergeStoreBench.setCharLV    avgt   15   5116.066    5108.269     ns/op
MergeStoreBench.setIntB      avgt   15   5104.537    5095.236     ns/op
MergeStoreBench.setIntBU     avgt   15   5104.838    5097.007     ns/op
MergeStoreBench.setIntBV     avgt   15   1228.375    1224.506     ns/op
MergeStoreBench.setIntL      avgt   15   2772.278    2764.388     ns/op
MergeStoreBench.setIntLU     avgt   15   5103.562    2573.624 *   ns/op
MergeStoreBench.setIntLV     avgt   15   5112.770    5105.804     ns/op
MergeStoreBench.setIntRB     avgt   15   5356.946    5348.785     ns/op
MergeStoreBench.setIntRBU    avgt   15   5420.478    5422.049     ns/op
MergeStoreBench.setIntRL     avgt   15   5297.975    5293.414     ns/op
MergeStoreBench.setIntRLU    avgt   15   5418.844    5126.889     ns/op
MergeStoreBench.setIntRU     avgt   15   5108.486    5097.927     ns/op
MergeStoreBench.setIntU      avgt   15   5091.868    5087.192     ns/op
MergeStoreBench.setLongB     avgt   15  10273.648   10249.037     ns/op
MergeStoreBench.setLongBU    avgt   15  10271.248   10238.910     ns/op
MergeStoreBench.setLongBV    avgt   15   2667.768    2663.647     ns/op
MergeStoreBench.setLongL     avgt   15   6316.791    6304.458     ns/op
MergeStoreBench.setLongLU    avgt   15  10304.012    2921.575 *   ns/op
MergeStoreBench.setLongLV    avgt   15   2667.648    2663.323     ns/op
MergeStoreBench.setLongRB    avgt   15  10269.238   10255.875     ns/op
MergeStoreBench.setLongRBU   avgt   15  10272.547   10227.856     ns/op
MergeStoreBench.setLongRL    avgt   15   6651.182    6641.173     ns/op
MergeStoreBench.setLongRLU   avgt   15  10263.975    3241.057 *   ns/op
MergeStoreBench.setLongRU    avgt   15   2621.004    2608.399     ns/op
MergeStoreBench.setLongU     avgt   15   2606.578    2594.970     ns/op

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for creating this benchmark! I'll definitely use it soon, when I try to extend MergeStores to more cases :)

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 25, 2024
@wenshao
Copy link
Contributor Author

wenshao commented Jul 25, 2024

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jul 25, 2024
@openjdk
Copy link

openjdk bot commented Jul 25, 2024

@wenshao
Your change (at version d00654f) is now ready to be sponsored by a Committer.

@eme64
Copy link
Contributor

eme64 commented Jul 25, 2024

@wenshao generally we like to have at least 2 reviews before integration ;)

@TobiHartmann
Copy link
Member

/labels add hotspot-compiler

@openjdk
Copy link

openjdk bot commented Jul 25, 2024

@TobiHartmann Unknown command labels - for a list of valid commands use /help.

@TobiHartmann
Copy link
Member

/label add hotspot-compiler

@openjdk
Copy link

openjdk bot commented Jul 25, 2024

@TobiHartmann
The hotspot-compiler label was successfully added.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too.

@TobiHartmann
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Jul 25, 2024

Going to push as commit 8081f87.
Since your change was applied there have been 13 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 25, 2024
@openjdk openjdk bot closed this Jul 25, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jul 25, 2024
@openjdk
Copy link

openjdk bot commented Jul 25, 2024

@TobiHartmann @wenshao Pushed as commit 8081f87.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants