-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8359424: Eliminate table lookup in Integer/Long toHexString #22942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back swen! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
The testing data from both aarch64 and x64 architectures indicates a performance improvement of 10% to 20%. However, under the MacBook M1 Pro environment, the performance enhancement for the Integer.toHexString scenario has reached 100%. 1. Scriptgit remote add wenshao [email protected]:wenshao/jdk.git
git fetch wenshao
# baseline 91db7c0877a
git checkout 91db7c0877a68ad171da2b4501280fc24630ae83
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"
# current 1788d09787c
git checkout 1788d09787cadfe6ec23b9b10bef87a2cdc029a3
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)-Benchmark (size) Mode Cnt Score Error Units (baseline 91db7c0877a)
-Integers.toHexString 500 avgt 15 4.855 ± 0.058 us/op
-Longs.toHexString 500 avgt 15 6.098 ± 0.034 us/op
+Benchmark (size) Mode Cnt Score Error Units (current 1788d09787c)
+Integers.toHexString 500 avgt 15 4.105 ± 0.010 us/op +18.27%
+Longs.toHexString 500 avgt 15 4.682 ± 0.116 us/op +30.24%
3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)-Benchmark (size) Mode Cnt Score Error Units
-Integers.toHexString 500 avgt 15 5.158 ± 0.025 us/op
-Longs.toHexString 500 avgt 15 6.072 ± 0.020 us/op
+Benchmark (size) Mode Cnt Score Error Units
+Integers.toHexString 500 avgt 15 4.691 ± 0.024 us/op +9.95%
+Longs.toHexString 500 avgt 15 4.947 ± 0.024 us/op +22.74%4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)-Benchmark (size) Mode Cnt Score Error Units
-Integers.toHexString 500 avgt 15 5.880 ± 0.017 us/op
-Longs.toHexString 500 avgt 15 7.183 ± 0.013 us/op
+Benchmark (size) Mode Cnt Score Error Units
+Integers.toHexString 500 avgt 15 5.282 ± 0.012 us/op +11.32%
+Longs.toHexString 500 avgt 15 5.530 ± 0.013 us/op +29.89%
5. MacBook M1 Pro (aarch64)-Benchmark (size) Mode Cnt Score Error Units (baseline 91db7c0877a)
-Integers.toHexString 500 avgt 15 10.519 ? 1.573 us/op
-Longs.toHexString 500 avgt 15 5.754 ? 0.264 us/op
+Benchmark (size) Mode Cnt Score Error Units (current 1788d09787c)
+Integers.toHexString 500 avgt 15 5.057 ? 0.015 us/op +108.00%
+Longs.toHexString 500 avgt 15 5.147 ? 0.095 us/op +11.79%
|
| * this string as a hexadecimal number to form and return a long value. | ||
| */ | ||
| public static long hex8(long i) { | ||
| long x = Long.expand(i, 0x0F0F_0F0F_0F0F_0F0FL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x86 should use pepd - but aarch64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems there is no good way to do so on aarch; raw shifts and going through VPU both seem to be slower.
|
Running it on my machine (Ryzen 9 3900X): I know that this processor has an extremely slow implementation of the |
|
I think that without proper assembly analysis won't be easy to check why... And yes, pdep is bad in old Ryzen @SirYwell :"( It could be either a branch prediction problem too (perfnorm would help) if the list of longs can produce small/big hex strings |
|
Because this algorithm underperforms compared to the original version when handling smaller numbers, I have marked this PR as draft. Additionally, this algorithm is used in another PR #22928 Speed up UUID::toString , and it still experiences performance degradation with Long.expand on older CPU architectures. // Method 1:
i = Long.reverseBytes(Long.expand(i, 0x0F0F_0F0F_0F0F_0F0FL));
// Method 2:
i = ((i & 0xF0000000L) >> 28)
| ((i & 0xF000000L) >> 16)
| ((i & 0xF00000L) >> 4)
| ((i & 0xF0000L) << 8)
| ((i & 0xF000L) << 20)
| ((i & 0xF00L) << 32)
| ((i & 0xF0L) << 44)
| ((i & 0xFL) << 56);Note: Using Long.reverseBytes + Long.expand is faster on x64 and ARMv9. I haven't tested this on older x64 CPUs, like the AMD ZEN1, but it's possible that they experience the same issue. |
|
@wenshao This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
|
@wenshao This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
|
/open |
|
@wenshao This pull request is now open |
# Conflicts: # src/java.base/share/classes/jdk/internal/util/HexDigits.java
Webrevs
|
|
Performance test figures show that using the vectorized method toHex can improve performance in most cases by eliminating table lookups. 1. Scriptgit remote add wenshao [email protected]:wenshao/jdk.git
git fetch wenshao
# baseline
git checkout 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"
# current
git checkout 6f491cedc2311115ab81b479620da3eb71385fc8
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)-# 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark (size) Mode Cnt Score Error Units
-Longs.toHexStringBig 500 avgt 15 5.984 ± 0.009 us/op
-Longs.toHexStringSmall 500 avgt 15 3.989 ± 0.036 us/op
-Integers.toHexStringBig 500 avgt 15 4.473 ± 0.029 us/op
-Integers.toHexStringSmall 500 avgt 15 4.166 ± 0.120 us/op
-Integers.toHexStringTiny 500 avgt 15 3.394 ± 0.014 us/op
+# 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark (size) Mode Cnt Score Error Units
+Longs.toHexStringBig 500 avgt 15 4.622 ± 0.011 us/op +29%
+Longs.toHexStringSmall 500 avgt 15 3.957 ± 0.035 us/op +0.8%
+Integers.toHexStringBig 500 avgt 15 3.985 ± 0.019 us/op +12.24%
+Integers.toHexStringSmall 500 avgt 15 3.617 ± 0.020 us/op +15.17%
+Integers.toHexStringTiny 500 avgt 15 3.033 ± 0.015 us/op +11.90%
3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)-# 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark (size) Mode Cnt Score Error Units
-Longs.toHexStringBig 500 avgt 15 5.685 ± 0.019 us/op
-Longs.toHexStringSmall 500 avgt 15 3.914 ± 0.009 us/op
-Integers.toHexStringBig 500 avgt 15 4.262 ± 0.025 us/op
-Integers.toHexStringSmall 500 avgt 15 4.049 ± 0.012 us/op
-Integers.toHexStringTiny 500 avgt 15 3.323 ± 0.015 us/op
+# 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark (size) Mode Cnt Score Error Units
+Longs.toHexStringBig 500 avgt 15 4.791 ± 0.005 us/op +18.65%
+Longs.toHexStringSmall 500 avgt 15 3.994 ± 0.022 us/op -2.00%
+Integers.toHexStringBig 500 avgt 15 4.184 ± 0.034 us/op +1.86%
+Integers.toHexStringSmall 500 avgt 15 3.771 ± 0.019 us/op +7.37%
+Integers.toHexStringTiny 500 avgt 15 3.133 ± 0.021 us/op +6.06%4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)-# 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark (size) Mode Cnt Score Error Units
-Longs.toHexStringBig 500 avgt 15 8.319 ± 0.139 us/op
-Longs.toHexStringSmall 500 avgt 15 4.985 ± 0.028 us/op
-Integers.toHexStringBig 500 avgt 15 5.489 ± 0.041 us/op
-Integers.toHexStringSmall 500 avgt 15 5.028 ± 0.033 us/op
-Integers.toHexStringTiny 500 avgt 15 3.921 ± 0.023 us/op
+# 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark (size) Mode Cnt Score Error Units
+Longs.toHexStringBig 500 avgt 15 5.346 ± 0.033 us/op +55.61%
+Longs.toHexStringSmall 500 avgt 15 4.383 ± 0.027 us/op +47.35%
+Integers.toHexStringBig 500 avgt 15 4.821 ± 0.052 us/op +13.85%
+Integers.toHexStringSmall 500 avgt 15 4.428 ± 0.036 us/op +13.55%
+Integers.toHexStringTiny 500 avgt 15 3.800 ± 0.046 us/op +3.18%
5. MacBook M1 Pro (aarch64)-# 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark (size) Mode Cnt Score Error Units
-Longs.toHexStringBig 500 avgt 15 6.878 ± 0.230 us/op
-Longs.toHexStringSmall 500 avgt 15 4.975 ± 0.381 us/op
-Integers.toHexStringBig 500 avgt 15 11.646 ± 2.699 us/op
-Integers.toHexStringSmall 500 avgt 15 5.040 ± 0.319 us/op
-Integers.toHexStringTiny 500 avgt 15 2.717 ± 0.023 us/op
+# 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark (size) Mode Cnt Score Error Units
+Longs.toHexStringBig 500 avgt 15 4.007 ± 0.023 us/op +71.64%
+Longs.toHexStringSmall 500 avgt 15 3.058 ± 0.024 us/op +62.68%
+Integers.toHexStringBig 500 avgt 15 3.399 ± 0.017 us/op +342.63%
+Integers.toHexStringSmall 500 avgt 15 3.163 ± 0.026 us/op +59.34%
+Integers.toHexStringTiny 500 avgt 15 2.909 ± 0.026 us/op -6.60%6. OrangePi 5
-# baseline 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark (size) Mode Cnt Score Error Units
-Integers.toHexStringBig 500 avgt 15 10.713 ± 0.760 us/op
-Integers.toHexStringSmall 500 avgt 15 9.590 ± 0.759 us/op
-Integers.toHexStringTiny 500 avgt 15 7.149 ± 0.405 us/op
-Longs.toHexStringBig 500 avgt 15 15.356 ± 1.449 us/op
-Longs.toHexStringSmall 500 avgt 15 9.900 ± 0.879 us/op
+# current 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark (size) Mode Cnt Score Error Units
+Integers.toHexStringBig 500 avgt 15 9.929 ± 0.878 us/op +7.89%
+Integers.toHexStringSmall 500 avgt 15 9.039 ± 0.715 us/op +6.09%
+Integers.toHexStringTiny 500 avgt 15 8.106 ± 0.601 us/op -11.80%
+Longs.toHexStringBig 500 avgt 15 11.351 ± 0.819 us/op +35.28%
+Longs.toHexStringSmall 500 avgt 15 8.957 ± 0.554 us/op +10.52%7. orangepirv2
-# baseline 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark (size) Mode Cnt Score Error Units
-Integers.toHexStringBig 500 avgt 15 94.656 ± 2.328 us/op
-Integers.toHexStringSmall 500 avgt 15 94.320 ± 1.469 us/op
-Integers.toHexStringTiny 500 avgt 15 92.469 ± 0.980 us/op
-Longs.toHexStringBig 500 avgt 15 143.218 ± 1.011 us/op
-Longs.toHexStringSmall 500 avgt 15 93.034 ± 1.610 us/op
+# current 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark (size) Mode Cnt Score Error Units
+Integers.toHexStringBig 500 avgt 15 115.484 ± 1.921 us/op -18.03%
+Integers.toHexStringSmall 500 avgt 15 112.554 ± 1.691 us/op -16.20%
+Integers.toHexStringTiny 500 avgt 15 107.421 ± 1.814 us/op -13.91%
+Longs.toHexStringBig 500 avgt 15 143.858 ± 0.914 us/op -0.44%
+Longs.toHexStringSmall 500 avgt 15 115.731 ± 1.178 us/op -16.61% |
Co-authored-by: ExE Boss <[email protected]>
|
@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
keep alive |
|
@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
/touch |
|
@wenshao The pull request is being re-evaluated and the inactivity timeout has been reset. |
|
@rgithubli Could you please help me take a look at this PR? |
|
Can we just update |
But that would result in two duplicate hex8 methods |
|
@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
/touch |
|
@wenshao The pull request is being re-evaluated and the inactivity timeout has been reset. |
|
@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
In PR #22928, UUID introduced long-based vectorized hexadecimal to string conversion, which can also be used in Integer::toHexString and Long::toHexString to eliminate table lookups. The benefit of eliminating table lookups is that the performance is better when cache misses occur.
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22942/head:pull/22942$ git checkout pull/22942Update a local copy of the PR:
$ git checkout pull/22942$ git pull https://git.openjdk.org/jdk.git pull/22942/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 22942View PR using the GUI difftool:
$ git pr show -t 22942Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22942.diff
Using Webrev
Link to Webrev Comment