-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8360192: C2: Make the type of count leading/trailing zero nodes more precise #25928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back qxing! A progress list of the required criteria for merging this PR into |
|
@MaxXSoft This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 333 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jaskarth, @merykitty, @jatin-bhateja, @eme64) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Webrevs
|
|
A stricter bound would be |
@merykitty Thanks for your review, did you mean |
|
@MaxXSoft Yes you are right, my mistake |
| #include "utilities/count_leading_zeros.hpp" | ||
| #include "utilities/count_trailing_zeros.hpp" | ||
|
|
||
| static int count_leading_zeros_int(jint i) { | ||
| return i == 0 ? BitsPerInt : count_leading_zeros(i); | ||
| } | ||
|
|
||
| static int count_leading_zeros_long(jlong l) { | ||
| return l == 0 ? BitsPerLong : count_leading_zeros(l); | ||
| } | ||
|
|
||
| static int count_trailing_zeros_int(jint i) { | ||
| return i == 0 ? BitsPerInt : count_trailing_zeros(i); | ||
| } | ||
|
|
||
| static int count_trailing_zeros_long(jlong l) { | ||
| return l == 0 ? BitsPerLong : count_trailing_zeros(l); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you need this?
Why is count_trailing_zeros and count_leading_zeros not enough, when you cast at the use-site?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because our implementation does not accept 0 as an input. I suggest doing this at count_leading_zeros, it makes more sense and also aligns our behaviour with the well-known countr_zero and countl_zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you need this? Why is
count_trailing_zerosandcount_leading_zerosnot enough, when you cast at the use-site?
@eme64 The explanation of @merykitty is right, the implementation of count_leading_zeros and count_trailing_zeros reject zero as the input.
Perhaps we could open another PR to add zero support for these functions, since it's less relevant to this node type change and might require other changes to the code that calls them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In src/hotspot/share/utilities/count_leading_zeros.hpp, it says that 0 behavior is undefined. Ok... but why do we do that? Is that a performance optimization ? If yes, is it really worth it? If there is no good reason not to handle 0, we should just handle it.
We have some tests in test/hotspot/gtest/utilities/test_count_leading_zeros.cpp.
It would be interesting to quickly check if any use of these methods could ever encounter zero, and then hit the assert. I would not be surprised if we found a bug here.
I think this would be a worth while cleanup task. I would prefer if we clean things up now, and don't just let more special handling code get integrated.
| @Test | ||
| @IR(failOn = IRNode.COUNT_LEADING_ZEROS_I) | ||
| public boolean clzCompareInt() { | ||
| return Integer.numberOfLeadingZeros(i) < 0 || Integer.numberOfLeadingZeros(i) > 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You limited the type to check the ones/zeroes of the input value, count_leading_zeros_int(~ti->_bits._zeros), count_leading_zeros_int(ti->_bits._ones),. I think we need test cases to cover those too, given they are more complex and thus more error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how the KnownBits works. I tried masking i with i & 0x00ffffff | 0x0000ffff, and evaluating CLZ on it. But C2 still seems to return an integer type with a trivial KnownBits, instead of KnownBits with _zeros=0xff000000 and _ones=0x0000ffff.
I also checked the PR which introduced KnownBits (#17508). @merykitty said "there is no node taking advantage of the additional information yet", and there's no IR tests in that PR. So it's probably impossible to write a test for this at the moment?
@merykitty Could you provide more help on this please? Thanks.
|
@MaxXSoft This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
Hi all, This patch has now passed all GHA tests and is ready for further reviews. If there are any other suggestions for this PR, please let me know. Thanks! |
jaskarth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice! I just have a comment on the unit test.
| const TypeLong* tl = t->is_long(); | ||
| if (tl->is_con()) { | ||
| return TypeInt::make(count_leading_zeros_long(tl->get_con())); | ||
| } | ||
| return TypeInt::make(count_leading_zeros_long(~tl->_bits._zeros), | ||
| count_leading_zeros_long(tl->_bits._ones), | ||
| tl->_widen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaxXSoft, Taking the liberty to add some comments related to the proof of the above assumptions, hope you won't mind :-)
Z3 proof for [KnownBits.ONES, ~KnownBits.ZEROS] >= [LB, UB] where >= is a superset relation.
from z3 import *
from functools import reduce
# 64-bit symbolic unsigned integers
UB = BitVec('UB', 64)
LB = BitVec('LB', 64)
# XOR for detecting differing bits
xor_val = UB ^ LB
# COUNT_LEADING_ZEROS
def count_leading_zeros(x):
"""Returns number of leading zeros in a 64-bit word."""
clauses = []
for i in range(64):
bit_is_one = LShR(x, 63 - i) & 1 == 1
bits_before_zero = And([LShR(x, 63 - j) & 1 == 0 for j in range(i)])
clauses.append(If(And(bits_before_zero, bit_is_one), BitVecVal(i, 64), BitVecVal(64, 64)))
return reduce(lambda a, b: If(a == 64, b, a), clauses)
# Step 1: Compute common prefix length
CPL = count_leading_zeros(xor_val)
# Step 2: COMMON_PREFIX_MASK = ((1 << CPL) - 1) << (64 - CPL)
one_shifted = (BitVecVal(1, 64) << CPL)
mask = one_shifted - 1
COMMON_PREFIX_MASK = mask << (64 - CPL)
# Step 3: COMMON_PREFIX
COMMON_PREFIX = UB & COMMON_PREFIX_MASK
# Step 4: ZEROS and ONES
ZEROS = COMMON_PREFIX_MASK & (~COMMON_PREFIX)
ONES = COMMON_PREFIX
# Step 5: Prove that [ONES, ~ZEROS] ⊇ [LB, UB]
prop = And(ULE(ONES, LB), ULE(UB, ~ZEROS))
# Step 6: Try to disprove (i.e., check if any UB, LB violates the above)
s = Solver()
s.add(Not(prop)) # Look for counterexamples
# Check the result
if s.check() == sat:
m = s.model()
ub_val = m.eval(UB).as_long()
lb_val = m.eval(LB).as_long()
ones_val = m.eval(ONES).as_long()
zeros_val = m.eval(ZEROS).as_long()
not_zeros_val = (~zeros_val) & 0xFFFFFFFFFFFFFFFF
print("❌ Property does NOT hold. Counterexample found:")
print(f"LB = {lb_val:#018x}")
print(f"UB = {ub_val:#018x}")
print(f"ONES = {ones_val:#018x}")
print(f"~ZEROS = {not_zeros_val:#018x}")
print(f"ONES <= LB? {ones_val <= lb_val}")
print(f"UB <= ~ZEROS? {ub_val <= not_zeros_val}")
else:
print("✅ Property holds: [ONES, ~ZEROS] always covers [LB, UB] (UNSAT)")
Manually worked out Example 1:-
UB = 0xFFFF00FF
LB = 0xFFFF0000
COMMON_PREFIX = 0xFFFF0000
ONES = 0xFFFF0000
ZEROS = 0x00000000
---------------------------
MAX = ~ZEROS
= 0xFFFFFFFF
MIN = ONES
= 0xFFFF0000
Thus, it's evident that MAX >= UB AND MIN <= LB
Manually worked out Example 2:-
UB = 0xFF0F00FF
LB = 0xFF0F0000
COMMON_PREFIX = 0xFF0F0000
ONES = 0xFF0F0000
ZEROS = 0x00F00000
Since logical OR b/w ONES and ZEROS forms the COMMON_PREFIX MASK i.e. both ZEROS and ONES are derived from COMMON_PREFIX therefore COMMON_PREFIX_MASK & ~ZERO == COMMON_PREFIX_MASK & ONES, apart from COMMON_PREFIX there are other lower order bits which are not included in COMMON_PREFIX, when we flip the ZEROS then we get a VALUE which equals ONES + we set all other lower order bits, thus this value is guaranteed to be greater than UB since it also set other non-set bits of UB.
Thus, UB = COMMON_PREFIX + OTHER_SET_LOWER_ORDER_BITS_IN_UB
~ZEROS = COMMON_PREGIX + REMAINING_BITS_SET_TO_ONE.
While ONES is just the common prefix, which is guaranteed to be less than UB since we don’t account for set bits of UB, which are not part of the common prefix.
Thus, the following inequality holds good.
~ZEROS >= UB >= LB >= ONES
Proof for the KnownBits application to CTZ / CLZ
- In both these cases, we are only concerned about the number of ZEROs which are either present at the start or towards the end of the integral bits sequence.
- KnownBits.ZEROS and KnownBits.ONES captures the known set bits in value range, i.e., if a bit is set on ONES it means it's guaranteed to be one in all the values in the value range; similarly, a set bit in ZEROS signifies that the corresponding bit remains unset in all the values in the range. All unset bits in ONES/ZEROS signify not known at compile time.
Since the flipped value of ZEROS gives us the maximum value in the range, and given that number of leading zeros of maximum value will never be greater than the number of leading zeros of minimum value of the range hence, value range of CLZ varies b/w CLZ(~ZEROS) and CLZ(ONES) as its lower and upper bounds.
While counting leading zeros is relatively straightforward to prove given that each integral value is the sum of 2^position of set bits and given that TypeInt/TypeLong comply with the invariant (LB <= UB)
Hence highest set bits position of UB can never be less than the LB.
For CTZ, consider the following example
UB = 0xFF800000
LB = 0xFF8FF0F0
COMMON_PREFIX = 0xFF800000
ONES = 0xFF800000
ZEROS = 0x00700000
Again, both ONES and ZEROS bits are extracted from the common prefix of the delimiting bounds of the value range.
MAX = ~ZEROS = 0xFF8FFFFF
MIN = ONES = 0xFF800000
Thus, the number of trailing zeros will be [lb:0, ub:20], given that the lower order bits of knowbits which are not part of the common prefix will always be set to zero, hence CTZ(~ZERO) will always be 0. This is also in line with the fact that the number of leading/trailing zeros will never be a non-negative value. Therefore, delimiting lower and upper bounds of CTZ are CTZ(~ZEROS) and CTZ(ONES).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition! Very nice and solid proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaxXSoft, Constant folding optimizations like these qualify for addition of a new benchmark. Please add one; I see a significant gain with following micro kernel on my Intel 13th Gen Intel(R) Core(TM) i3-1315U laptop.
public static long micro(long param) {
long constrained_param = Math.min(175, Math.max(param, 160));
return Long.numberOfLeadingZeros(constrained_param);
}
PROMPT>#baseline
PROMPT>java -Xbatch -XX:-TieredCompilation -cp . test_clz
[time] 17ms [res] 11200000
PROMPT>#withopt
PROMPT>java -Xbatch -XX:-TieredCompilation -cp . test_clz
[time] 5ms [res] 11200000
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll run some testing and review afterward.
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good patch, thanks for the work and patience with the review - it's been a bit slow over summer with vacation/travel.
| #include "utilities/count_leading_zeros.hpp" | ||
| #include "utilities/count_trailing_zeros.hpp" | ||
|
|
||
| static int count_leading_zeros_int(jint i) { | ||
| return i == 0 ? BitsPerInt : count_leading_zeros(i); | ||
| } | ||
|
|
||
| static int count_leading_zeros_long(jlong l) { | ||
| return l == 0 ? BitsPerLong : count_leading_zeros(l); | ||
| } | ||
|
|
||
| static int count_trailing_zeros_int(jint i) { | ||
| return i == 0 ? BitsPerInt : count_trailing_zeros(i); | ||
| } | ||
|
|
||
| static int count_trailing_zeros_long(jlong l) { | ||
| return l == 0 ? BitsPerLong : count_trailing_zeros(l); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In src/hotspot/share/utilities/count_leading_zeros.hpp, it says that 0 behavior is undefined. Ok... but why do we do that? Is that a performance optimization ? If yes, is it really worth it? If there is no good reason not to handle 0, we should just handle it.
We have some tests in test/hotspot/gtest/utilities/test_count_leading_zeros.cpp.
It would be interesting to quickly check if any use of these methods could ever encounter zero, and then hit the assert. I would not be surprised if we found a bug here.
I think this would be a worth while cleanup task. I would prefer if we clean things up now, and don't just let more special handling code get integrated.
| return TypeInt::make(count_leading_zeros_int(~ti->_bits._zeros), | ||
| count_leading_zeros_int(ti->_bits._ones), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct, but I would like to see a short comment why it is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same in other cases below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
@MaxXSoft Feel free to just ping me again when you want another review :) |
|
@eme64 Thank you for your patience and kind reviews! I've updated this patch based on your suggestions. This patch is now ready for further review. |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Though I think you modified the test example so far that it does not work any more, i.e. it would not constant fold if the output range was wrong. I've identified 2 sources that would prevent constant folding:
- It is unclear if
getResultChecksumwould get inlined. That way, theresultloses the type information about the ranges. - The comparisons themselves would not constant fold, because the values you compare with are not constants, but array element loads. You need to compare
resultwith a compile time constant.
Maybe the idea is not 100% clear for you:
Imagine result should be in some range 2..10. But with a bug, we now return 3..10. This means the output of numberOfLeadingZeros is still variable, and it does not constant fold. But: if there is something below it, like a result < 3 ... this would now constant fold to false, even though we could have had a 2 at runtime.
But for this to work, it all needs to be in the same compilation unit, and the value we compare to result must be a compile time constant.
Does that make sense?
| } | ||
| } | ||
|
|
||
| int getResultChecksum(int result, int[] LIMITS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put a @ForceInlinie before this. You are using it in many methods, and so it may not get inlined reliably. And if it does not get inlined, then the result verifcation would not constant-fold, and so it would be kind of useless. Because we rely on the fact that if the range is wrong, we could get bad constant folding ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added @ForceInline.
| for (int i = 0; i < LIMITS.length; i += 2) { | ||
| if (result < LIMITS[i]) sum += 1 << i; | ||
| if (result > LIMITS[i + 1]) sum += 1 << (i + 1); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doublt that this works, because the test would not constant fold if the range was too narrow.
I think you need to manually unroll the loop, and load the constants from static final values, or another method that allows it to be a compile time constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
@MaxXSoft This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
Hi @eme64, thanks for your review. I've updated the IR test to ensure that range checks work with the constant folding. Do you have any other suggestions for this patch? |
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxXSoft Ok, now it looks really good. I have one minor nit.
Running some internal testing now.
| LIMITS_32_0 = INTS_32.next(); | ||
| LIMITS_32_1 = INTS_32.next(); | ||
| LIMITS_32_2 = INTS_32.next(); | ||
| LIMITS_32_3 = INTS_32.next(); | ||
| LIMITS_32_4 = INTS_32.next(); | ||
| LIMITS_32_5 = INTS_32.next(); | ||
| LIMITS_32_6 = INTS_32.next(); | ||
| LIMITS_32_7 = INTS_32.next(); | ||
|
|
||
| LIMITS_64_0 = INTS_64.next(); | ||
| LIMITS_64_1 = INTS_64.next(); | ||
| LIMITS_64_2 = INTS_64.next(); | ||
| LIMITS_64_3 = INTS_64.next(); | ||
| LIMITS_64_4 = INTS_64.next(); | ||
| LIMITS_64_5 = INTS_64.next(); | ||
| LIMITS_64_6 = INTS_64.next(); | ||
| LIMITS_64_7 = INTS_64.next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not assign them directly? You just need to declare the generators first. Would save us a couple of lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing passed, code looks good to me too now :)
Thanks for all the work you put into this, really appreciated
eme64
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, even better :)
|
@eme64 Thank you for the review! @merykitty @jatin-bhateja Do you have any other suggestions regarding the latest changes in this patch? |
The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases.
This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch:
Testing: tier1, IR test
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928$ git checkout pull/25928Update a local copy of the PR:
$ git checkout pull/25928$ git pull https://git.openjdk.org/jdk.git pull/25928/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25928View PR using the GUI difftool:
$ git pr show -t 25928Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25928.diff
Using Webrev
Link to Webrev Comment