8315789: Minor HexFormat performance improvements #15591

cl4es · 2023-09-06T13:36:22Z

This PR seeks to improve formatting of hex digits using java.util.HexFormat somewhat.

This is achieved getting rid of a couple of lookup tables, caching the result of HexFormat.of().withUpperCase(), and removing tiny allocation that happens in the formatHex(A, byte) method. Improvements range from 20-40% on throughput, and some operations allocate less:

Name                               Cnt   Base   Error   Test   Error   Unit   Diff%
HexFormatBench.appenderLower        15  1,330 ± 0,021  1,065 ± 0,067  us/op   19,9% (p = 0,000*)
  :gc.alloc.rate                    15 11,481 ± 0,185  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
  :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
  :gc.count                         15  3,000          0,000         counts
  :gc.time                           3  2,000                            ms
HexFormatBench.appenderLowerCached  15  1,317 ± 0,013  1,065 ± 0,054  us/op   19,1% (p = 0,000*)
  :gc.alloc.rate                    15 11,590 ± 0,111  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
  :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
  :gc.count                         15  3,000          0,000         counts
  :gc.time                           3  2,000                            ms
HexFormatBench.appenderUpper        15  1,330 ± 0,022  1,065 ± 0,036  us/op   19,9% (p = 0,000*)
  :gc.alloc.rate                    15 34,416 ± 0,559  0,007 ± 0,000 MB/sec -100,0% (p = 0,000*)
  :gc.alloc.rate.norm               15 48,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
  :gc.count                         15  0,000          0,000         counts
HexFormatBench.appenderUpperCached  15  1,353 ± 0,009  1,033 ± 0,014  us/op   23,6% (p = 0,000*)
  :gc.alloc.rate                    15 11,284 ± 0,074  0,007 ± 0,000 MB/sec  -99,9% (p = 0,000*)
  :gc.alloc.rate.norm               15 16,009 ± 0,000  0,007 ± 0,000   B/op -100,0% (p = 0,000*)
  :gc.count                         15  3,000          0,000         counts
  :gc.time                           3  2,000                            ms
HexFormatBench.toHexLower           15  0,198 ± 0,001  0,119 ± 0,008  us/op   40,1% (p = 0,000*)
  :gc.alloc.rate                    15  0,007 ± 0,000  0,007 ± 0,000 MB/sec   -0,0% (p = 0,816 )
  :gc.alloc.rate.norm               15  0,001 ± 0,000  0,001 ± 0,000   B/op  -40,1% (p = 0,000*)
  :gc.count                         15  0,000          0,000         counts
HexFormatBench.toHexLowerCached     15  0,201 ± 0,002  0,114 ± 0,001  us/op   43,0% (p = 0,000*)
  :gc.alloc.rate                    15  0,007 ± 0,000  0,007 ± 0,000 MB/sec   -0,2% (p = 0,116 )
  :gc.alloc.rate.norm               15  0,001 ± 0,000  0,001 ± 0,000   B/op  -43,1% (p = 0,000*)
  :gc.count                         15  0,000          0,000         counts
HexFormatBench.toHexUpper           15  0,146 ± 0,002  0,114 ± 0,001  us/op   21,6% (p = 0,000*)
  :gc.alloc.rate                    15  0,007 ± 0,000  0,007 ± 0,000 MB/sec    0,0% (p = 0,668 )
  :gc.alloc.rate.norm               15  0,001 ± 0,000  0,001 ± 0,000   B/op  -21,5% (p = 0,000*)
  :gc.count                         15  0,000          0,000         counts
HexFormatBench.toHexUpperCached     15  0,199 ± 0,002  0,116 ± 0,003  us/op   41,7% (p = 0,000*)
  :gc.alloc.rate                    15  0,007 ± 0,000  0,007 ± 0,000 MB/sec    0,0% (p = 0,684 )
  :gc.alloc.rate.norm               15  0,001 ± 0,000  0,001 ± 0,000   B/op  -41,7% (p = 0,000*)
  :gc.count                         15  0,000          0,000         counts
  * = significant

Invariant parameters used by above microbenchmarks:
size:   512

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8315789: Minor HexFormat performance improvements (Enhancement - P4)

Reviewers

Roger Riggs (@RogerRiggs - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15591/head:pull/15591
$ git checkout pull/15591

Update a local copy of the PR:
$ git checkout pull/15591
$ git pull https://git.openjdk.org/jdk.git pull/15591/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15591

View PR using the GUI difftool:
$ git pr show -t 15591

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15591.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2023-09-06T13:37:40Z

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-09-06T13:38:46Z

@cl4es The following label will be automatically applied to this pull request:

core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-09-06T13:43:00Z

Webrevs

01: Full - Incremental (c2393766)
00: Full (59a1f609)

rgiulietti · 2023-09-06T15:29:45Z

src/java.base/share/classes/java/util/HexFormat.java

+        if (value < 10) {
+            return (char)('0' + value);
+        }
+        if (digitCase == Case.LOWERCASE) {
+            return (char)('a' - 10 + value);
+        }
+        return (char)('A' - 10 + value);


Not sure if this would adversely impact performance, but what about factoring out these lines in a private method? They are repeated in toHighHexDigit.

Surprisingly this does carry some cost in the microbenchmarks on my M1. Might be noise, but it seems rather consistent so I'll need to investigate.

This:

diff --git a/src/java.base/share/classes/java/util/HexFormat.java b/src/java.base/share/classes/java/util/HexFormat.java index 107c362cbc2..177548c03f7 100644 --- a/src/java.base/share/classes/java/util/HexFormat.java +++ b/src/java.base/share/classes/java/util/HexFormat.java @@ -634,14 +634,7 @@ private static String escapeNL(String string) { * @return the hexadecimal character for the low 4 bits {@code 0-3} of the value */ public char toLowHexDigit(int value) { - value = value & 0xf; - if (value < 10) { - return (char)('0' + value); - } - if (digitCase == Case.LOWERCASE) { - return (char)('a' - 10 + value); - } - return (char)('A' - 10 + value); + return toHexDigit(value & 0xf); } /** @@ -655,7 +648,10 @@ public char toLowHexDigit(int value) { * @return the hexadecimal character for the bits {@code 4-7} of the value */ public char toHighHexDigit(int value) { - value = (value >> 4) & 0xf; + return toHexDigit((value >> 4) & 0xf); + } + + private char toHexDigit(int value) { if (value < 10) { return (char)('0' + value); }

.. clearly increase cost across all micros:

Name Cnt Base Error Test Error Unit Diff% HexFormatBench.appenderLower 15 1,046 ± 0,041 1,301 ± 0,017 us/op -24,4% (p = 0,000*) HexFormatBench.appenderLowerCached 15 1,056 ± 0,055 1,175 ± 0,115 us/op -11,3% (p = 0,001*) HexFormatBench.appenderUpper 15 1,059 ± 0,055 1,303 ± 0,012 us/op -23,0% (p = 0,000*) HexFormatBench.appenderUpperCached 15 1,099 ± 0,014 1,451 ± 0,267 us/op -32,0% (p = 0,000*) HexFormatBench.toHexLower 15 0,322 ± 0,002 0,338 ± 0,005 us/op -4,8% (p = 0,000*) HexFormatBench.toHexLowerCached 15 0,324 ± 0,003 0,411 ± 0,005 us/op -27,0% (p = 0,000*) HexFormatBench.toHexUpper 15 0,324 ± 0,003 0,340 ± 0,003 us/op -4,9% (p = 0,000*) HexFormatBench.toHexUpperCached 15 0,322 ± 0,001 0,411 ± 0,004 us/op -27,6% (p = 0,000*)

RogerRiggs · 2023-09-06T15:34:25Z

src/java.base/share/classes/java/util/HexFormat.java

+    private final Case digitCase;
+
+    private enum Case {
+        LOWERCASE,
+        UPPERCASE
+    }


Instead of an enum cache the "A" or "a" for direct use.
private final char caseBase;
Initialize it in the constructor.

RogerRiggs · 2023-09-06T15:34:54Z

src/java.base/share/classes/java/util/HexFormat.java

+        if (digitCase == Case.LOWERCASE) {
+            return (char)('a' - 10 + value);
+        }
+        return (char)('A' - 10 + value);


Would caching the upper/lower case base avoid a branch?

Suggested change

if (digitCase == Case.LOWERCASE) {

return (char)('a' - 10 + value);

}

return (char)('A' - 10 + value);

return (char)(caseBase - 10 + value);

I tried this but it looks like it is marginally slower - plausibly the code I have means the JIT eliminates the untaken branch and constant folds this neatly. I'll do some digging..

wenshao · 2023-09-07T17:09:49Z

You can refer to the PR #14745 I submitted. Using a table with a length of 256 in HexDigits, you can get two digits in one lookup table, and use ByteArrayLittleEndian.setShort to write two digits at a time.

like this:

   public String toHexDigits(byte value) {
        byte[] rep = new byte[2];
        ByteArrayLittleEndian.setShort(rep, 0, HexDigits.DIGITS[value & 0xff]); // DIGITS shuld changed to little-endian
        try {
            return jla.newStringNoRepl(rep, StandardCharsets.ISO_8859_1);
        } catch (CharacterCodingException cce) {
            throw new AssertionError(cce);
        }
    }

cl4es · 2023-09-07T23:37:20Z

I took ByteArrayLittleEndian for a spin on a (new) micro for String toHexDigits(byte), and that gives us ~4% (MacBook M1):

Name                       Cnt      Base    Error       Test    Error   Unit   Diff%
HexFormatBench.toHexDigits  15     1,943 ±  0,014      1,862 ±  0,009  us/op    4,1% (p = 0,000*)
  * = significant

I'm not sure 4% is a large enough win on a micro to motivate the lookup-table based approach. I'd rather see us investigate if we could consolidate these overlapping utility classes (HexFormat and HexDigits) on a lookup-table free solution.

wenshao · 2023-09-08T01:11:57Z

Should we test the performance comparison of toHexDigits(long)?

I've done some work before, and I didn't find a way to perform better than a lookup table without using a lookup table.

rgiulietti · 2023-09-08T08:47:04Z

I'm not sure that micro-benchmarks are very indicative on whether a lookup table performs better than short and straightforward code.
The reason is that, once in the CPU caches, a lookup table in micro-benchmarks stays there, whereas in more realistic situations, where access is more spread out in time, it is often evicted to make room for other, more lively data.
A micro-benchmark using a lookup table shows good results because of a high rate of cache hits, whereas in other real-world workloads a lookup table might result in many cache misses on access.

cl4es · 2023-09-08T09:33:42Z

As @rgiulietti says lookup-table algorithms may outperform in microbenchmarks but lose out in real world scenarios, so we need to stay clear unless there's major benefit.

And as it turns out, the relative benefit seem to come mainly from the use of ByteArrayLittleEndian, not from using the lookup table in HexDigits.DIGITS:

     public String toHexDigits(byte value) {
         byte[] rep = new byte[2];
-        rep[0] = (byte)toHighHexDigit(value);
-        rep[1] = (byte)toLowHexDigit(value);
+        ByteArrayLittleEndian.setChar(rep, 0, (char)(toHighHexDigit(value) << 8 | toLowHexDigit(value)));
         try {
             return jla.newStringNoRepl(rep, StandardCharsets.ISO_8859_1);
         } catch (CharacterCodingException cce) {

This gets the same speed-up (4%) as calling HexDigits.DIGITS:

Name                       Cnt      Base    Error       Test    Error   Unit   Diff%
HexFormatBench.toHexDigits  15     1,943 ±  0,014      1,860 ±  0,014  us/op    4,2% (p = 0,000*)
  * = significant

cl4es · 2023-09-08T10:22:31Z

I also tried variants of this for toHexDigits(short|int|long) but it always ends up worse. Both when chunking 2 digits at a time:

        ByteArrayLittleEndian.setChar(rep, 2, (char)(toHighHexDigit(value) << 8 | toLowHexDigit(value)));
        value >>= 8;
        ByteArrayLittleEndian.setChar(rep, 0, (char)(toHighHexDigit(value) << 8 | toLowHexDigit(value)));

and when doing it all in one go (this code is toHexDigits(short), which writes four nibbles):

        ByteArrayLittleEndian.setInt(rep, 0, toHighHexDigit(value >> 8) << 24 | toLowHexDigit(value >> 8) << 16 | toHighHexDigit(value) << 8 | toLowHexDigit(value)));

Only a win on toHexDigits(byte):

Name                            Cnt  Base   Error   Test   Error  Unit   Diff%
HexFormatBench.toHexDigitsByte   15 1,992 ± 0,008  1,908 ± 0,053 us/op    4,2% (p = 0,000*)
HexFormatBench.toHexDigitsInt    15 2,476 ± 0,018  2,896 ± 0,023 us/op  -16,9% (p = 0,000*)
HexFormatBench.toHexDigitsLong   15 3,992 ± 0,052  4,229 ± 0,036 us/op   -5,9% (p = 0,000*)
HexFormatBench.toHexDigitsShort  15 2,183 ± 0,018  2,800 ± 0,162 us/op  -28,3% (p = 0,000*)
  * = significant

This is indicative that any win here comes from tickling the JIT the right way, rather than some intrinsic property of ByteArrayLittleEndian. I'll leave the code unchanged but add these microbenchmarks if anyone wants to attempt other improvements.

wenshao · 2023-09-08T20:03:46Z

I'm not sure that micro-benchmarks are very indicative on whether a lookup table performs better than short and straightforward code.
The reason is that, once in the CPU caches, a lookup table in micro-benchmarks stays there, whereas in more realistic situations, where access is more spread out in time, it is often evicted to make room for other, more lively data.
A micro-benchmark using a lookup table shows good results because of a high rate of cache hits, whereas in other real-world workloads a lookup table might result in many cache misses on access.

In the HexFormat scenario, if the length of the input byte[] is larger, the performance of using the lookup table will be better.

If the length of byte[] is greater than 1 in most scenarios, using lookup table will have better performance.

cl4es · 2023-09-13T19:53:27Z

I ran some experiments with a lookup-table approach (based on HexDigits and I can get some of these to be marginally faster when combining a lookup-table approach with the ByteArray hack, but there's no win when using one or the other in isolation. So I think much of the win is actually not from using a lookup-table, but from tickling the JIT to inline more and optimize a bit more aggressively. So I think this might be a case of micros telling us sweet little lies, and we should favor the intuition that lookup tables should be avoided unless absolutely necessary.

I prefer the simplicity of this PR as it stands and think we should backtrack on some of the lookup tables we've recently added in jdk.internal.util.Hex|Decimal|OctalDigits.

RogerRiggs

This non-lookup version looks fine; it won't invalidate caches when used and the code is easy to understand. Thanks for the cleanup and re-checking the performance benefits.

openjdk · 2023-09-13T20:41:20Z

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8315789: Minor HexFormat performance improvements

Reviewed-by: rriggs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 219 new commits pushed to the master branch:

ce93d27: 6228794: java.text.ChoiceFormat pattern behavior is not well documented.
3b0a6d2: 8314226: Series of colon-style fallthrough switch cases with guards compiled incorrectly
ff240a9: 8316087: Test SignedLoggerFinderTest.java is still failing
a731a24: 8315934: RISC-V: Disable conservative fences per vendor
b3dad24: 8316021: Serial: Remove unused Generation::post_compact
f9ab115: 8316050: Use hexadecimal encoding in MemorySegment::toString
f804f86: 8314612: TestUnorderedReduction.java fails with -XX:MaxVectorSize=32 and -XX:+AlignVector
f8df754: 8311207: Cleanup for Optimization for UUID.toString
fecd2fd: 8315898: Open source swing JMenu tests
bb6b3f2: 8315761: Open source few swing JList and JMenuBar tests
... and 209 more: https://git.openjdk.org/jdk/compare/12de9b0225363377e9a76729b11698221d4f29f2...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

cl4es · 2023-09-13T20:56:12Z

/integrate

openjdk · 2023-09-13T20:58:27Z

Going to push as commit 92ad4a2.
Since your change was applied there have been 219 commits pushed to the master branch:

ce93d27: 6228794: java.text.ChoiceFormat pattern behavior is not well documented.
3b0a6d2: 8314226: Series of colon-style fallthrough switch cases with guards compiled incorrectly
ff240a9: 8316087: Test SignedLoggerFinderTest.java is still failing
a731a24: 8315934: RISC-V: Disable conservative fences per vendor
b3dad24: 8316021: Serial: Remove unused Generation::post_compact
f9ab115: 8316050: Use hexadecimal encoding in MemorySegment::toString
f804f86: 8314612: TestUnorderedReduction.java fails with -XX:MaxVectorSize=32 and -XX:+AlignVector
f8df754: 8311207: Cleanup for Optimization for UUID.toString
fecd2fd: 8315898: Open source swing JMenu tests
bb6b3f2: 8315761: Open source few swing JList and JMenuBar tests
... and 209 more: https://git.openjdk.org/jdk/compare/12de9b0225363377e9a76729b11698221d4f29f2...master

Your commit was automatically rebased without conflicts.

openjdk · 2023-09-13T20:58:33Z

@cl4es Pushed as commit 92ad4a2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8315789: Minor HexFormat performance improvements

59a1f60

cl4es mentioned this pull request Sep 6, 2023

8314774: Optimize URLEncoder #15354

Closed

3 tasks

openjdk bot added the rfr Pull request is ready for review label Sep 6, 2023

openjdk bot added the core-libs [email protected] label Sep 6, 2023

rgiulietti reviewed Sep 6, 2023

View reviewed changes

RogerRiggs reviewed Sep 6, 2023

View reviewed changes

Add toHexDigitsByte|Short|Int|Long microbenchmarks

c239376

cl4es mentioned this pull request Sep 11, 2023

8315970: Big-endian issues after JDK-8310929 #15652

Closed

3 tasks

RogerRiggs approved these changes Sep 13, 2023

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Sep 13, 2023

openjdk bot added the integrated Pull request has been integrated label Sep 13, 2023

openjdk bot closed this Sep 13, 2023

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 13, 2023

wenshao mentioned this pull request Sep 15, 2023

8316426: Optimization for HexFormat.formatHex #15768

Closed

3 tasks

cl4es deleted the hex branch September 15, 2023 20:13

8315789: Minor HexFormat performance improvements #15591

8315789: Minor HexFormat performance improvements #15591

Uh oh!

Conversation

cl4es commented Sep 6, 2023 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Webrev

Uh oh!

bridgekeeper bot commented Sep 6, 2023

Uh oh!

openjdk bot commented Sep 6, 2023

Uh oh!

mlbridge bot commented Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

rgiulietti Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

cl4es Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

cl4es Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

RogerRiggs Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

RogerRiggs Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

cl4es Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

wenshao commented Sep 7, 2023

Uh oh!

cl4es commented Sep 7, 2023

Uh oh!

wenshao commented Sep 8, 2023

Uh oh!

rgiulietti commented Sep 8, 2023

Uh oh!

cl4es commented Sep 8, 2023

Uh oh!

cl4es commented Sep 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenshao commented Sep 8, 2023

Uh oh!

cl4es commented Sep 13, 2023

Uh oh!

RogerRiggs left a comment

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented Sep 13, 2023

Uh oh!

cl4es commented Sep 13, 2023

Uh oh!

openjdk bot commented Sep 13, 2023

Uh oh!

openjdk bot commented Sep 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

cl4es commented Sep 6, 2023 •

edited by openjdk bot

Loading

mlbridge bot commented Sep 6, 2023 •

edited

Loading

cl4es commented Sep 8, 2023 •

edited

Loading