Skip to content

Conversation

@wenshao
Copy link
Contributor

@wenshao wenshao commented Jan 7, 2025

In PR #22928, UUID introduced long-based vectorized hexadecimal to string conversion, which can also be used in Integer::toHexString and Long::toHexString to eliminate table lookups. The benefit of eliminating table lookups is that the performance is better when cache misses occur.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8359424: Eliminate table lookup in Integer/Long toHexString (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22942/head:pull/22942
$ git checkout pull/22942

Update a local copy of the PR:
$ git checkout pull/22942
$ git pull https://git.openjdk.org/jdk.git pull/22942/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22942

View PR using the GUI difftool:
$ git pr show -t 22942

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22942.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 7, 2025

👋 Welcome back swen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 7, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jan 7, 2025

@wenshao The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@wenshao
Copy link
Contributor Author

wenshao commented Jan 7, 2025

The testing data from both aarch64 and x64 architectures indicates a performance improvement of 10% to 20%. However, under the MacBook M1 Pro environment, the performance enhancement for the Integer.toHexString scenario has reached 100%.

1. Script

git remote add wenshao [email protected]:wenshao/jdk.git
git fetch wenshao

# baseline 91db7c0877a
git checkout 91db7c0877a68ad171da2b4501280fc24630ae83
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"

 # current 1788d09787c
git checkout 1788d09787cadfe6ec23b9b10bef87a2cdc029a3
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"

2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units (baseline 91db7c0877a)
-Integers.toHexString     500  avgt   15  4.855 ± 0.058  us/op
-Longs.toHexString        500  avgt   15  6.098 ± 0.034  us/op


+Benchmark             (size)  Mode  Cnt  Score   Error  Units (current 1788d09787c)
+Integers.toHexString     500  avgt   15  4.105 ± 0.010  us/op +18.27%
+Longs.toHexString        500  avgt   15  4.682 ± 0.116  us/op +30.24%

3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units
-Integers.toHexString     500  avgt   15  5.158 ± 0.025  us/op
-Longs.toHexString        500  avgt   15  6.072 ± 0.020  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units
+Integers.toHexString     500  avgt   15  4.691 ± 0.024  us/op  +9.95%
+Longs.toHexString        500  avgt   15  4.947 ± 0.024  us/op +22.74%

4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)

-Benchmark             (size)  Mode  Cnt  Score   Error  Units
-Integers.toHexString     500  avgt   15  5.880 ± 0.017  us/op
-Longs.toHexString        500  avgt   15  7.183 ± 0.013  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units
+Integers.toHexString     500  avgt   15  5.282 ± 0.012  us/op +11.32%
+Longs.toHexString        500  avgt   15  5.530 ± 0.013  us/op +29.89%

5. MacBook M1 Pro (aarch64)

-Benchmark             (size)  Mode  Cnt   Score   Error  Units (baseline 91db7c0877a)
-Integers.toHexString     500  avgt   15  10.519 ? 1.573  us/op
-Longs.toHexString        500  avgt   15  5.754 ? 0.264  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units (current 1788d09787c)
+Integers.toHexString     500  avgt   15  5.057 ? 0.015  us/op +108.00%
+Longs.toHexString        500  avgt   15  5.147 ? 0.095  us/op  +11.79%

* this string as a hexadecimal number to form and return a long value.
*/
public static long hex8(long i) {
long x = Long.expand(i, 0x0F0F_0F0F_0F0F_0F0FL);
Copy link

@franz1981 franz1981 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86 should use pepd - but aarch64?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is no good way to do so on aarch; raw shifts and going through VPU both seem to be slower.

@wenshao wenshao marked this pull request as draft January 8, 2025 00:37
@SirYwell
Copy link
Member

SirYwell commented Jan 8, 2025

Running it on my machine (Ryzen 9 3900X):

baseline (91db7c0877a68ad171da2b4501280fc24630ae83):
Integers.toHexString     500  avgt   15  5.717 ± 0.274  us/op
Longs.toHexString        500  avgt   15  6.851 ± 0.214  us/op

this change (1788d09787cadfe6ec23b9b10bef87a2cdc029a3):
Integers.toHexString     500  avgt   15  21.334 ± 0.268  us/op
Longs.toHexString        500  avgt   15  38.907 ± 0.589  us/op

I know that this processor has an extremely slow implementation of the PDEP instruction, but I'm kinda surprised you're seeing better results on aarch64. But I think Zen1 and Zen2 should considered here to avoid regressions on those architectures.

@franz1981
Copy link

franz1981 commented Jan 8, 2025

I think that without proper assembly analysis won't be easy to check why...

And yes, pdep is bad in old Ryzen @SirYwell :"(

It could be either a branch prediction problem too (perfnorm would help) if the list of longs can produce small/big hex strings

@wenshao
Copy link
Contributor Author

wenshao commented Jan 8, 2025

Because this algorithm underperforms compared to the original version when handling smaller numbers, I have marked this PR as draft.

Additionally, this algorithm is used in another PR #22928 Speed ​​up UUID::toString , and it still experiences performance degradation with Long.expand on older CPU architectures.

// Method 1:
i = Long.reverseBytes(Long.expand(i, 0x0F0F_0F0F_0F0F_0F0FL));

// Method 2:
i = ((i & 0xF0000000L) >> 28)
  | ((i & 0xF000000L) >> 16)
  | ((i & 0xF00000L) >> 4)
  | ((i & 0xF0000L) << 8)
  | ((i & 0xF000L) << 20)
  | ((i & 0xF00L) << 32)
  | ((i & 0xF0L) << 44)
  | ((i & 0xFL) << 56);

Note: Using Long.reverseBytes + Long.expand is faster on x64 and ARMv9.
However, on AArch64 with ARMv8, it will be slower compared to the manual unrolling shown in Method 2.
ARMv8 includes Apple M1/M2, AWS Graviton 3; ARMv9.0 includes Apple M3/M4, Aliyun Yitian 710.

I haven't tested this on older x64 CPUs, like the AMD ZEN1, but it's possible that they experience the same issue.

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 5, 2025

@wenshao This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented May 1, 2025

@wenshao This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this May 1, 2025
@wenshao
Copy link
Contributor Author

wenshao commented May 22, 2025

/open

@openjdk openjdk bot reopened this May 22, 2025
@openjdk
Copy link

openjdk bot commented May 22, 2025

@wenshao This pull request is now open

@wenshao wenshao changed the title Eliminate table lookup in Integer/Long toHexString 8359424: Eliminate table lookup in Integer/Long toHexString Jun 13, 2025
@wenshao wenshao marked this pull request as ready for review June 13, 2025 12:01
@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 13, 2025
@mlbridge
Copy link

mlbridge bot commented Jun 13, 2025

Webrevs

@wenshao
Copy link
Contributor Author

wenshao commented Jun 13, 2025

Performance test figures show that using the vectorized method toHex can improve performance in most cases by eliminating table lookups.

1. Script

git remote add wenshao [email protected]:wenshao/jdk.git
git fetch wenshao

# baseline
git checkout 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"
           
 # current
git checkout 6f491cedc2311115ab81b479620da3eb71385fc8
make test TEST="micro:java.lang.Integers.toHexString"
make test TEST="micro:java.lang.Longs.toHexString"

2. aliyun_ecs_c8a_x64 (CPU AMD EPYC™ Genoa)

-# 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark                  (size)  Mode  Cnt  Score   Error  Units
-Longs.toHexStringBig          500  avgt   15  5.984 ± 0.009  us/op
-Longs.toHexStringSmall        500  avgt   15  3.989 ± 0.036  us/op
-Integers.toHexStringBig       500  avgt   15  4.473 ± 0.029  us/op
-Integers.toHexStringSmall     500  avgt   15  4.166 ± 0.120  us/op
-Integers.toHexStringTiny      500  avgt   15  3.394 ± 0.014  us/op

+# 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark                  (size)  Mode  Cnt  Score   Error  Units
+Longs.toHexStringBig          500  avgt   15  4.622 ± 0.011  us/op +29%
+Longs.toHexStringSmall        500  avgt   15  3.957 ± 0.035  us/op +0.8%
+Integers.toHexStringBig       500  avgt   15  3.985 ± 0.019  us/op +12.24%
+Integers.toHexStringSmall     500  avgt   15  3.617 ± 0.020  us/op +15.17%
+Integers.toHexStringTiny      500  avgt   15  3.033 ± 0.015  us/op +11.90%

3. aliyun_ecs_c8i_x64 (CPU Intel®Xeon®Emerald Rapids)

-# 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark                  (size)  Mode  Cnt  Score   Error  Units
-Longs.toHexStringBig          500  avgt   15  5.685 ± 0.019  us/op
-Longs.toHexStringSmall        500  avgt   15  3.914 ± 0.009  us/op
-Integers.toHexStringBig       500  avgt   15  4.262 ± 0.025  us/op
-Integers.toHexStringSmall     500  avgt   15  4.049 ± 0.012  us/op
-Integers.toHexStringTiny      500  avgt   15  3.323 ± 0.015  us/op

+# 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark                  (size)  Mode  Cnt  Score   Error  Units
+Longs.toHexStringBig          500  avgt   15  4.791 ± 0.005  us/op +18.65%
+Longs.toHexStringSmall        500  avgt   15  3.994 ± 0.022  us/op  -2.00%
+Integers.toHexStringBig       500  avgt   15  4.184 ± 0.034  us/op  +1.86%
+Integers.toHexStringSmall     500  avgt   15  3.771 ± 0.019  us/op  +7.37%
+Integers.toHexStringTiny      500  avgt   15  3.133 ± 0.021  us/op  +6.06%

4. aliyun_ecs_c8y_aarch64 (CPU Aliyun Yitian 710)

-# 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark                  (size)  Mode  Cnt  Score   Error  Units
-Longs.toHexStringBig          500  avgt   15  8.319 ± 0.139  us/op
-Longs.toHexStringSmall        500  avgt   15  4.985 ± 0.028  us/op
-Integers.toHexStringBig       500  avgt   15  5.489 ± 0.041  us/op
-Integers.toHexStringSmall     500  avgt   15  5.028 ± 0.033  us/op
-Integers.toHexStringTiny      500  avgt   15  3.921 ± 0.023  us/op

+# 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark                  (size)  Mode  Cnt  Score   Error  Units
+Longs.toHexStringBig          500  avgt   15  5.346 ± 0.033  us/op +55.61%
+Longs.toHexStringSmall        500  avgt   15  4.383 ± 0.027  us/op +47.35%
+Integers.toHexStringBig       500  avgt   15  4.821 ± 0.052  us/op +13.85%
+Integers.toHexStringSmall     500  avgt   15  4.428 ± 0.036  us/op +13.55%
+Integers.toHexStringTiny      500  avgt   15  3.800 ± 0.046  us/op  +3.18%

5. MacBook M1 Pro (aarch64)

-# 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark                  (size)  Mode  Cnt   Score   Error  Units
-Longs.toHexStringBig          500  avgt   15  6.878 ± 0.230  us/op
-Longs.toHexStringSmall        500  avgt   15  4.975 ± 0.381  us/op
-Integers.toHexStringBig       500  avgt   15 11.646 ± 2.699  us/op
-Integers.toHexStringSmall     500  avgt   15  5.040 ± 0.319  us/op
-Integers.toHexStringTiny      500  avgt   15  2.717 ± 0.023  us/op

+# 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark                  (size)  Mode  Cnt  Score   Error  Units
+Longs.toHexStringBig          500  avgt   15  4.007 ± 0.023  us/op  +71.64%
+Longs.toHexStringSmall        500  avgt   15  3.058 ± 0.024  us/op  +62.68%
+Integers.toHexStringBig       500  avgt   15  3.399 ± 0.017  us/op +342.63%
+Integers.toHexStringSmall     500  avgt   15  3.163 ± 0.026  us/op  +59.34%
+Integers.toHexStringTiny      500  avgt   15  2.909 ± 0.026  us/op  -6.60%

6. OrangePi 5

-# baseline 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark                  (size)  Mode  Cnt   Score   Error  Units
-Integers.toHexStringBig       500  avgt   15  10.713 ± 0.760  us/op
-Integers.toHexStringSmall     500  avgt   15   9.590 ± 0.759  us/op
-Integers.toHexStringTiny      500  avgt   15   7.149 ± 0.405  us/op
-Longs.toHexStringBig          500  avgt   15  15.356 ± 1.449  us/op
-Longs.toHexStringSmall        500  avgt   15   9.900 ± 0.879  us/op

+# current 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark                  (size)  Mode  Cnt  Score   Error  Units
+Integers.toHexStringBig       500  avgt   15  9.929 ± 0.878  us/op  +7.89%
+Integers.toHexStringSmall     500  avgt   15  9.039 ± 0.715  us/op  +6.09%
+Integers.toHexStringTiny      500  avgt   15  8.106 ± 0.601  us/op -11.80%
+Longs.toHexStringBig          500  avgt   15 11.351 ± 0.819  us/op +35.28%
+Longs.toHexStringSmall        500  avgt   15  8.957 ± 0.554  us/op +10.52%

7. orangepirv2

-# baseline 7731f4df6ba12b4e38f17f87bd42b1da6dc68f95
-Benchmark                  (size)  Mode  Cnt   Score   Error  Units
-Integers.toHexStringBig       500  avgt   15  94.656 ± 2.328  us/op
-Integers.toHexStringSmall     500  avgt   15  94.320 ± 1.469  us/op
-Integers.toHexStringTiny      500  avgt   15  92.469 ± 0.980  us/op
-Longs.toHexStringBig          500  avgt   15 143.218 ± 1.011  us/op
-Longs.toHexStringSmall        500  avgt   15  93.034 ± 1.610  us/op


+# current 6f491cedc2311115ab81b479620da3eb71385fc8
+Benchmark                  (size)  Mode  Cnt    Score   Error  Units
+Integers.toHexStringBig       500  avgt   15  115.484 ± 1.921  us/op -18.03%
+Integers.toHexStringSmall     500  avgt   15  112.554 ± 1.691  us/op -16.20%
+Integers.toHexStringTiny      500  avgt   15  107.421 ± 1.814  us/op -13.91%
+Longs.toHexStringBig          500  avgt   15  143.858 ± 0.914  us/op  -0.44%
+Longs.toHexStringSmall        500  avgt   15  115.731 ± 1.178  us/op -16.61%

@wenshao wenshao requested a review from liach June 14, 2025 09:21
@bridgekeeper
Copy link

bridgekeeper bot commented Jul 12, 2025

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@wenshao
Copy link
Contributor Author

wenshao commented Jul 14, 2025

keep alive

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 13, 2025

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@wenshao
Copy link
Contributor Author

wenshao commented Aug 14, 2025

/touch

@openjdk
Copy link

openjdk bot commented Aug 18, 2025

@wenshao The pull request is being re-evaluated and the inactivity timeout has been reset.

@openjdk openjdk bot added rfr Pull request is ready for review and removed rfr Pull request is ready for review labels Aug 18, 2025
@wenshao
Copy link
Contributor Author

wenshao commented Aug 21, 2025

@rgithubli Could you please help me take a look at this PR?

@liach
Copy link
Member

liach commented Aug 21, 2025

Can we just update formatUnsignedInt to remove the table lookup? This patch essentially duplicates code and is more confusing than helpful. For example, we still need to check if this batch-fetched hex8 can have the loop for putting vectorized.

@wenshao
Copy link
Contributor Author

wenshao commented Aug 21, 2025

Can we just update formatUnsignedInt to remove the table lookup? This patch essentially duplicates code and is more confusing than helpful. For example, we still need to check if this batch-fetched hex8 can have the loop for putting vectorized.

But that would result in two duplicate hex8 methods

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 19, 2025

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@wenshao
Copy link
Contributor Author

wenshao commented Sep 19, 2025

/touch

@openjdk
Copy link

openjdk bot commented Sep 19, 2025

@wenshao The pull request is being re-evaluated and the inactivity timeout has been reset.

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 17, 2025

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

5 participants