-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8333833: Remove the use of ByteArrayLittleEndian from UUID::toString #19610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back wenshao! A progress list of the required criteria for merging this PR into |
|
@wenshao This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 19 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@cl4es) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
The performance numbers under MacBookPro M1 Max are as follows: -Benchmark (size) Mode Cnt Score Error Units (#master 8ffc35d117846a7a2aa08afed662273d2f887770 )
-UUIDBench.toString 20000 thrpt 15 103.904 ? 0.772 ops/us
+Benchmark (size) Mode Cnt Score Error Units (# current 30373b81fddbf7e82340e466cf6425a5252399d2 )
+UUIDBench.toString 20000 thrpt 15 109.529 ? 1.156 ops/us + 5.41% |
Webrevs
|
liach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean bug ID 8333833, right?
|
As far as I know, ByteArrayLittleEndian uses the VarHandle mechanism, which more efficiently writes different primitives into the array, unlike the basic |
|
@sunmisc BALE uses byte array view VH which still uses Unsafe: jdk/src/java.base/share/classes/java/lang/invoke/X-VarHandleByteArrayView.java.template Line 141 in 8d2f9e5
Please take a look at #16245; you will see that C2 now JIT compiles these compatible "different primitives" like Unsafe would do, yet there's a bit of requirement on code shape. Thus I recommended the comment for wenshao, so future changes won't accidentally destroy the code shape and the optimization. |
|
The C2 optimization brought by PR #16245 makes many of the previous performance improvement techniques based on VarHandle/ByteArray/Unsafe no longer meaningful, and many optimizations based on this need to be changed back. |
|
And in addition, VarHandle is not initialized unless it's necessary; thus, programs that use UUIDs but not VarHandle no longer need to initialize VarHandle. See #15386 where JDK startup has a performance degradation because it had to initialize VarHandle after using BALE. |
|
I think we don't need to change them back everywhere, but only need to rewrite Maybe I should rewrite #14636 without using |
| | (DIGITS[b1 & 0xff] << 16) | ||
| | (((long) DIGITS[b2 & 0xff]) << 32) | ||
| | (((long) DIGITS[b3 & 0xff]) << 48); | ||
| public static void putHex(byte[] buffer, int off, int i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be 2 methods - for 2 and 4 bytes respectively?
Does c2 optimize 8 byte writes as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 4-byte unsigned int input for 8-byte write sounds plausible, I personally am fine either with or without it.
Does c2 optimize 8 byte writes as well?
From the first few lines of #16245:
Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8-byte writing requires converting int to long. The performance is similar to the current version, but an additional method putHex8 needs to be added. The current version has less code.
The following is the code for writing 8 bytes:
class UUID {
@Override
public String toString() {
byte[] buf = new byte[36];
HexDigits.putHex8(buf, 0, (int) (mostSigBits >> 32));
HexDigits.putHex10(buf, 8, (int) mostSigBits);
HexDigits.putHex10(buf, 18, (int) (leastSigBits >> 32));
HexDigits.putHex8(buf, 28, (int) leastSigBits);
try {
return jla.newStringNoRepl(buf, StandardCharsets.ISO_8859_1);
} catch (CharacterCodingException cce) {
throw new AssertionError(cce);
}
}
}
class HexDigits {
public static void putHex8(byte[] bytes, int off, int i) {
long v = (((long) DIGITS[(i >> 16) & 0xff]) << 48)
| (((long) DIGITS[(i >> 24) & 0xff]) << 32)
| ( DIGITS[ i & 0xff] << 16)
| ( DIGITS[(i >> 8 ) & 0xff]);
bytes[off] = (byte) v;
bytes[off + 1] = (byte) (v >> 8);
bytes[off + 2] = (byte) (v >> 16);
bytes[off + 3] = (byte) (v >> 24);
bytes[off + 4] = (byte) (v >> 32);
bytes[off + 5] = (byte) (v >> 40);
bytes[off + 6] = (byte) (v >> 48);
bytes[off + 7] = (byte) (v >> 56);
}
public static void putHex10(byte[] bytes, int off, int i) {
int v0 = (DIGITS[(i >> 16) & 0xff] << 16)
| DIGITS[(i >> 24) & 0xff];
int v1 = (DIGITS[i & 0xff] << 16)
| DIGITS[(i >> 8 ) & 0xff];
bytes[off] = '-';
bytes[off + 1] = (byte) v0;
bytes[off + 2] = (byte) (v0 >> 8);
bytes[off + 3] = (byte) (v0 >> 16);
bytes[off + 4] = (byte) (v0 >> 24);
bytes[off + 5] = '-';
bytes[off + 6] = (byte) v1;
bytes[off + 7] = (byte) (v1 >> 8);
bytes[off + 8] = (byte) (v1 >> 16);
bytes[off + 9] = (byte) (v1 >> 24);
}
}
You are right, ByteArray and ByteArrayLittleEndian have good performance after removing Unsafe. This is similar to the previous version of java.io.Bits class ByteArrayLittleEndian {
public static void setInt(byte[] array, int offset, int value) {
array[offset ] = (byte) value;
array[offset + 1] = (byte) (value >> 8);
array[offset + 2] = (byte) (value >> 16);
array[offset + 3] = (byte) (value >> 24);
}
}
class HexDigits {
public static void putHex4(byte[] array, int offset, int value) {
// Prepare an int value so C2 generates a 4-byte write instead of two 2-byte writes
ByteArrayLittleEndian.setInt(
array,
offset,
(DIGITS[value & 0xff] << 16) | DIGITS[(value >> 8) & 0xff]);
}
} |
Do you have evidence that |
@cl4es has fixed startup regression issues, such as this #15836 |
cl4es
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see #16245 in action, enabling simpler code with equal or better performance.
|
/integrate |
Co-authored-by: Claes Redestad <[email protected]>
|
Thanks. I hate to nitpick, but is it OK if I rename the RFE as "Remove the use of ByteArrayLittleEndian from UUID::toString" (the PR need to follow suit). I think the current name might be read as doing something completely different. |
|
Great, go ahead and /integrate again and I'll sponsor. |
|
/integrate |
|
/sponsor |
|
Going to push as commit 8aa35ca.
Your commit was automatically rebased without conflicts. |
After PR #16245, C2 optimizes stores into primitive arrays by combining values into larger stores. In the UUID.toString method, ByteArrayLittleEndian can be removed, making the code more elegant and faster.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19610/head:pull/19610$ git checkout pull/19610Update a local copy of the PR:
$ git checkout pull/19610$ git pull https://git.openjdk.org/jdk.git pull/19610/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19610View PR using the GUI difftool:
$ git pr show -t 19610Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19610.diff
Webrev
Link to Webrev Comment