-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store #19218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store #19218
Conversation
…ve arrays by combining values into larger store
|
👋 Welcome back rrich! A progress list of the required criteria for merging this PR into |
|
@reinrich This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 33 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@offamitkumar you can put this through your testing if you like. It should solve the issues with test/hotspot/jtreg/compiler/c2/TestMergeStores.java also for s390. |
|
@reinrich test is passing on s390x with your change. tier1 test are in progress. Update: tier1 test are also clean on s390x; |
Webrevs
|
|
@reinrich thanks for taking this up! |
| private static final Unsafe UNSAFE = Unsafe.getUnsafe(); | ||
| private static final Random RANDOM = Utils.getRandomInstance(); | ||
|
|
||
| private static final boolean IS_BIG_ENDIAN = UNSAFE.isBigEndian(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static is very important here, so that the if constant fold in the test. Otherwise we don't know if we have the IR rule pass because of the correct branch. Maybe add a comment for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I assumed that is clear to people looking at jit compiler tests :)
I removed IS_BIG_ENDIAN again since it wasn't needed anymore with the last comit (3169a31).
|
It's not obvious to me why something like can't be merged. Is it because you only use |
|
In other words, it seems like it could work for arbitrary byte values if the merged value was computed from those individual values. They wouldn't need to be shifted values. The example above would either write 0x01020304 or 0x04030201 depending on the endianness. |
The stores could be merged to the following pseudo code: *(int*)&a[offset + 2] = (int)(v >> 16); // MergedThe current logic doesn't accept the right shift here. I'll clarify the synopsis of this pr and the comment in |
|
@dean-long @reinrich |
|
Thanks for looking at the pr.
I've done that for |
|
Test error is unrelated to the changes. Upload of test results failed: |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running testing again, but the code looks good now!
I just had another idea:
Could we use some sort of "byte reverse / shuffle" operation to do these use cases for both big/little-endian?
storeBytes(bytes, offset, (byte)(value >> 8),
(byte)(value >> 0));
storeBytes(bytes, offset, (byte)(value >> 0),
(byte)(value >> 8));
Not sure if that would be profitable or even available on all platforms. Could be a future RFE someone can work on after this. What do you think? It might make performance more predictable across platforms.
src/hotspot/share/opto/memnode.cpp
Outdated
| // `_store` and `first` are swapped in the diagram above | ||
| Node* hi = first->in(MemNode::ValueIn); | ||
| Node* lo = _store->in(MemNode::ValueIn); | ||
| #endif // VM_LITTLE_ENDIAN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A swap could be more concise. But I leave that up to you ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It's better to just swap hi with lo and it matches the comment.
|
@reinrich please ping me again to ask if testing is ok before you integrate ;) |
Thanks for picking this up again. I quickly wanted to let you know that I'm out of office. I will be back in a week. |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good.
|
@reinrich please still wait until the JDK24 fork on Thrusday to integrate, so that we do not have to backport possible regression fixes - I had 3 or 4 with my original patch ;) |
You mean to combine the stores even if the explicit ordering does not match the ordering of the store instruction, adding a |
|
I did another round of testing on s390x. looks good. |
Thanks Amit. |
|
/integrate |
|
Going to push as commit f7862bd.
Your commit was automatically rebased without conflicts. |
This pr adds a few tweaks to JDK-8318446 which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store.
By example (from
TestMergeStores.java):Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks).
Additionally I've fixed a few comments and a test bug.
The optimization seems to be a little bit more effective on big endian platforms.
Again by example:
The sequence of candidate stores begins at the lowest store (in Memory def-use order) and is trimmed to a power of 2 removing higher stores if necessary. On little endian platforms this removes the least significant bytes to be stored. Merging would require a right shift of the input value. While possible this is currently not done.
With big endian order the stores of the more significant bytes are removed and the merge succeeds because no shift is needed.
I introduced new platform attributes
little-endian,big-endianto the IR testing framework to be able to adapt IR matching rules to this difference.Testing:
TestMergeStores.javaon AIX and S390.JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. JCK, SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests.
Testing was done with fastdebug builds on the main platforms and also on Linux/PPC64le and AIX.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19218/head:pull/19218$ git checkout pull/19218Update a local copy of the PR:
$ git checkout pull/19218$ git pull https://git.openjdk.org/jdk.git pull/19218/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19218View PR using the GUI difftool:
$ git pr show -t 19218Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19218.diff
Webrev
Link to Webrev Comment