Skip to content

Conversation

@merykitty
Copy link
Member

@merykitty merykitty commented Aug 27, 2023

Hi,

This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below.

In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance.

Please kindly review, thanks very much.

Testing

  • GHA
  • Linux x64, tier 1-4

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Warning

 ⚠️ Patch contains a binary file (test/jdk/javax/management/loading/LibraryLoader/native.jar)

Issue

  • JDK-8315066: Add unsigned bounds and known bits to TypeInt/Long (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15440/head:pull/15440
$ git checkout pull/15440

Update a local copy of the PR:
$ git checkout pull/15440
$ git pull https://git.openjdk.org/jdk.git pull/15440/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15440

View PR using the GUI difftool:
$ git pr show -t 15440

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15440.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 27, 2023

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 27, 2023

@merykitty The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 27, 2023
@mlbridge
Copy link

mlbridge bot commented Aug 27, 2023

Webrevs

@merykitty
Copy link
Member Author

Regarding duality, I don't really understand the concept of duality used in the type system, from the usage of dual in join(x, y) = dual(meet(dual(x), dual(y))), I kind of understand that the dual of a type is the complement of that type, but I don't see why do we need to represent it using the same representation. Furthermore, having an invalid type instance seems to be dangerous. This concept is only used to calculate the join of 2 types, so I use a _dual field to indicate whether we are calculating the union or the intersection of 2 sets, and make of invalid types (e.g. intersection of 2 nonoverlap types, or make with contradicting constraints) will result in Type::TOP. I'm not sure there is any other component relying on the old behaviour.

@merykitty
Copy link
Member Author

The details regarding the transformation in each node:

  • And/Or/Xor: Previously, ad-hoc rules are used, such that and of a positive is a positive, or of 2 bools is a bool, etc. This is naturally expanded in a generalised manner using the bit information.

  • L/R/URShift: Since the operations only concern the lowest bits of the shift amount, bit information is a natural generalisation, bounds calculation are mostly the same, and the information of the pushed-in bits can be used.

  • CountLeadingZeros, CountTrailingZeros, PopCount: Previously, this can only do constant propagation, otherwise it returns the typed bottom. The type can be greatly sharpened.

  • Add/Sub: Previously, bounds can only be inferred if there is no overflow occurs, I expand this a little bit so that bounds can also be inferred if the upper and lower bounds overflow in the same manner. Unsigned bounds are added, and bits are calculated manually, this can help in situations such as:

      for (int i = 0; i < max; i += 4) {
          bool b = (i & 3) == 0; // This is constant zero
      }
    
  • CmpI/L: Unsigned bounds help us represent TypeInt::CC_NE, helps narrow the comparison results when the inputs do not overlap

  • CmpU/UL: An ad-hoc rule regarding comparing of an Add/Sub and a constant was used, this is replaced by the usage of unsigned bounds

@merykitty
Copy link
Member Author

merykitty commented Sep 21, 2023

@TobiHartmann Thanks a lot for taking a look at this patch. Regarding your questions:

  • I don't think splitting these 2 would ease reviewing. Since individual changes are small and mostly contained in various Value methods, splitting unsigned bounds and known bits would likely to result in the need to review the same thing twice.
  • Maybe I miss some important context, but IIUC, the current implementation is more hacky. Since a value set and its dual set are fundamentally different concepts, what is currently done is that we find a common representation space for both of them and a binary operation on that space such that the union operations on both the value sets and the dual sets map to that binary operation when we map the 2 sets onto the representation space. In a more detailed explanation, the value set of is the set of x such that lo <= x && x <= hi, the dual set is the set of x such that x > hi || x < lo, the representation space is (x, y) and the mapping for the 2 sets are (lo, hi) -> (lo, hi) and (lo, hi) -> (hi, lo), respectively. Incidentally, the binary operation popping up is (min(x1, x2), max(y1, y2)). This binary operation, unfortunately, is ad-hoc and may not be trivial for more complex sets.
  • Yes the sign of the result is determined by its sign bit so this patch does solve that issue.
  • I think the patch fully covers John's prototype since the prototype does not concern with unsigned bounds as well as the interaction between signed bounds and known bits. The bit calculation of add and sub is really impressive, though.

Thanks a lot,
Quan Anh

@merykitty
Copy link
Member Author

This binary operation, unfortunately, is ad-hoc and may not be trivial for more complex sets.

To be clear, there exists a fairly trivial representation for this particular problem, that to map a value set

{t | t s>= lo && t s<= hi && t u>= ulo && t <= uhi && (t & zeros) == 0 && (~t & ones) == 0} -> (lo, hi, ulo, uhi, zeros, ones)

and the corresponding dual set

{t | t s< lo || t s> hi || t u< ulo || t u> uhi || (t & zeros) != 0 || (~t & ones) != 0} -> (hi, lo, uhi, ulo, ~zeros, ~ones)

However, this is pretty backward as I come up with the maps from the meet operation

(smin(lo1, lo2), smax(hi1, hi2), umin(ulo1, ulo2), umax(uhi1, uhi2), zeros1 & zeros2, ones1 & ones2)

and it simply moves the complexity and burden of correctness from xmeet where branching on the _dual field is pretty straightforward to xdual where this representation is a fair leap of faith. So IMO it is clearer to have a clear distinction between a value set and a dual set.

Another problem is that currently the same TypeInteger can be the representation of 2 different things, an empty value set or a valid dual set, such as:

if (failtype->_lo > failtype->_hi) {

Although they may have the same representation, the dual set corresponding to the point (x, y) is not the same as the empty value set (x, y), similar to how a point (1, 2) is different from the complex number 1 + 2i, they live in different spaces. This poses potential problems such as the non-uniqueness of the empty set, dual set incorrectly reports itself as empty, etc. This currently does not present any problem because the usage of dual sets is limited to the calculation of intersection operations only. But having non-canonical empty sets floating around that we can extract their bounds in the same way as a valid set may introduce problems. As a result, I think verifying the validity of a TypeInteger when it is created is a desirable approach.

Ideally, we can have different types for an integer value set and its corresponding dual set. This is because a value set and a dual set are fundamentally different things and trying to cramp them into one representation space is not mathematically sound. The tradeoff however is that a fairly significant changes to the Type hierarchy is needed which is unnecessary for the only benefit of usages in join operations. As a result, I added assert(!_dual, "") to all methods apart from xmeet and xdual. The bounds, however, can still be freely accessed since they are not methods, so if a TypeInteger representing a dual set somehow appears in an unexpected location then accessing its bounds may lead to surprising behaviours.

This is my understanding of the TypeInteger hierarchy, please correct me if there is any misunderstanding, thanks a lot.

@TobiHartmann
Copy link
Member

I don't think splitting these 2 would ease reviewing

Okay, thanks for the details! Let's leave it as is then.

Yes the sign of the result is determined by its sign bit so this patch does solve that issue.

I closed JDK-8311597 as duplicate.

I think the patch fully covers John's prototype

Okay, I leave it to @rose00 to decide if JDK-8001436 should be closed as duplicate then.

it simply moves the complexity and burden of correctness from xmeet where branching on the _dual field is pretty straightforward to xdual where this representation is a fair leap of faith

Thanks for summarizing. Right, I agree that in this case it's much cleaner to special case the dual types but I don't think that would apply in general to all the types (for example, TypePtr::meet_instptr). So this is all about consistency of new changes with existing code and I would prefer consistency over special casing. If we identify improvements to the existing representation of types, that should be done in a separate change and consistently over all types. I'm curious what other reviewers think though.

@merykitty
Copy link
Member Author

@TobiHartmann I have tried your suggestion, it turns out that the interdependence between constraints makes it difficult to tolerate ambiguity in type presentations. Without knowing whether the context is a dual type or not, it is really hard to normalise the constraints, verify them and normalise the widen parameter. What do you think? Thanks.

@TobiHartmann
Copy link
Member

Thanks for checking, @merykitty. Fair enough, I'm fine with leaving it as-is then. Let's see what other reviewers think.

@merykitty
Copy link
Member Author

May someone take a look at this patch, please?

@openjdk
Copy link

openjdk bot commented Dec 7, 2023

@merykitty this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout improvevalue
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed rfr Pull request is ready for review labels Dec 7, 2023
@bridgekeeper
Copy link

bridgekeeper bot commented Jan 4, 2024

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk openjdk bot added rfr Pull request is ready for review and removed merge-conflict Pull request has merge conflict with target branch labels Jan 7, 2024
@merykitty
Copy link
Member Author

May someone give their opinion on this PR, please? Thanks a lot.

@eme64
Copy link
Contributor

eme64 commented Jan 19, 2024

@merykitty I've been seeing this PR for a while. I can't promise high priority here, but this looks interesting. It is also a scarily big changeset 😅

A first remark: I would love to see more tests. For example: have some expressions that have certain bits random, and others masked on or off. Then you can have all sorts of ifs with bit checks, that lead to some special if-else code patterns. With an IR rule you can then see if this if gets folded, by checking if the true-block and/or false-block are still present.

What do you think?

@merykitty
Copy link
Member Author

@eme64 Thanks a lot for looking into this. Do you have any potential patterns in mind? I can come up with random patterns but they seem arbitrary and are most likely to be covered by other IR tests already.

@merykitty
Copy link
Member Author

@eme64

It is also a scarily big changeset 😅

Yes, it definitely is. If this is deemed a good idea then I would probably split it into patches which deal with different nodes and submit them separately. What do you think?

@rose00
Copy link
Contributor

rose00 commented Jan 20, 2024

As you might expect (from an examination of JDK-8001436) I’m very excited to see this work going forward.

A very long time ago I worked out tight, cheap estimates for bitwise effects of arithmetic operations, and arithmetic effects for bitwise operators. (BTW, did you know that ‘x^y = x+y-2*(x&y)’? That’s the sort of identity we are working with here.) The overall point is that, if you know both arithmetic and bitwise ranges for operands, you can infer tight arithmetic and bitwise ranges for all common operators. I see you’ve gone farther than I did, adding unsigned ranges as well, and more ops (popc). Excellent, excellent.

I have a request, though, and it is the same request as with your work on division strength reduction. The inference rules need to be factored out and separately g-tested.

I see equations like this in the middle of the C2 code in this PR:

U known = (min ^~ max) & (i1->_zeros | i1->_ones) & (i2->_zeros | i2->_ones);

It is hard to demonstrate that this is correct. It cannot be unit-tested separately when it appears like this in the midst of IR transforms. We have had many subtle bugs before from “math inclusions” like this in the middle of the IR optimizer.

I believe that formula is PROBABLY correct, because I deeply respect your mathematical ability, but that is not good enough. If someone accidentally changes it (or needs to change it for some maintenance reason), we might not have a math savant handy to re-verify it. We need an executable test to give us confidence.

In addition, I personally dislike doing pencil-and-paper proofs of code snippets I have to extract manually from wherever it is to do their work. I would much prefer to see the relevant algorithms factored out in their own source files (if they are subtle, as these are). I like to reason from clean formulas, not from formulas that I had to tear away from their duty stations in the optimizer.

I think we need a separate file (header and source) for this set of algorithms. Something like rangeInference.[ch]pp. The API points would take one or two inputs, each represented as a full bundle of range data (hi/lo/uh/ul/zs/os). The would also take an operator name (not one API point per operator, please, but maybe one for unaries and one for binaries). And pass 64-bit ints plus length indicators, rather than doing it more than once for different primitive sizes. For naming the operators I suggest either opto node names (Mul not MulL), or (a non-opto-centric choice) the operator names from the Panama Vector API (JEP 460 at last sighting).

The API point would symbolically execute the named op over the given symbolic (ranged) arguments, returning a new set of ranged (by C++ references I assume, or maybe a “little struct” for range tuples).

The switch-over-op at the heart of it would be carefully written for clarity, to prove (once and for all) that we know what we are talking about. The gtest would work like my BitTypes demo program, exercising all of the operations through a systematic set of concrete input values and (then) ranges containing those values. We would make sure that (a) the concrete result never “escapes” the predicted range (for any of a series of representative containing ranges). And also, when possible, (b) that the predicted range is “tight” in some good way, probably that the concrete result meets each of its inclusive bounds (for each range data point), for at least one set of concrete inputs.

I think this kind of testing work is necessary to ensure that our system can be understood and maintained by more than a few people on the planet. And (although I came up with some of the formulas) I’d personally prefer it that way as well. It would be nice to know that if someone bumps an operator or drops a parenthesis, a unit test will catch the bug, rather than either a PhD student re-analyzing the code, or (more likely) a bug that shows up long after system regression tests.

The benefits of these optimizations are, basically, that we can push around many more inferences about bitwise operations and unsigned comparisons, beyond what the original C2 design envisioned for inferences about (signed) arithmetic operations.

This in turn will give us CC_NE (a nice thing, finally). Moreover, it will help us infer the low-order bits of loop indexes, something useful for proving alignment (in Panama at least). If applied to the lanes of vector types, it will give us a much more robust set of information about what’s going on in vectorized code.

So I’m all for it! But, with unit tests, please…

@merykitty merykitty marked this pull request as draft January 20, 2024 19:18
@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed rfr Pull request is ready for review labels Jan 20, 2024
@merykitty
Copy link
Member Author

@rose00 I really appreciate your careful analysis of this patch and your detailed explanations and brilliant suggestions. This patch serves more like a prototype since I have realized that it can be divided into separate patches each of which will do the change on some nodes. The first one I have submitted separately from this patch will implement the capabilities in TypeInt and TypeLong themselves. Each of the other following patches will change the transformations of a set of nodes to take advantages of this. And for non-trivial transformations I will surely add unit tests to thoroughly cover all the common and corner cases.

Thanks a lot for your tremendous support on both this and the other patches.

@openjdk
Copy link

openjdk bot commented Mar 13, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@bridgekeeper
Copy link

bridgekeeper bot commented May 8, 2024

@merykitty This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 4, 2024

@merykitty This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler [email protected] merge-conflict Pull request has merge conflict with target branch

Development

Successfully merging this pull request may close these issues.

4 participants