Skip to content

Conversation

@ichttt
Copy link
Contributor

@ichttt ichttt commented Jul 6, 2025

This PR improves the value of interger division nodes.
Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case
We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once.
This also cleans up and unifies the code paths for DivINode and DivLNode.
I've added some tests to validate the optimization. Without the changes, some of these tests fail.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8364766: C2: Improve Value() of DivI and DivL for non-constant inputs (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26143/head:pull/26143
$ git checkout pull/26143

Update a local copy of the PR:
$ git checkout pull/26143
$ git pull https://git.openjdk.org/jdk.git pull/26143/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26143

View PR using the GUI difftool:
$ git pr show -t 26143

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26143.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 6, 2025

👋 Welcome back ichttt! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 6, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Jul 6, 2025

@ichttt The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 3, 2025

@ichttt This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@ichttt ichttt changed the title Improve Value() of DivI and DivL for non-constant inputs 8364766: Improve Value() of DivI and DivL for non-constant inputs Aug 6, 2025
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 6, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 6, 2025

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing this enhancement, @ichttt! Your code is very well commented and thus easy to follow, and you tested your changes thoroughly. Nice work!

I mainly nitpicked on a few details below. But I have one question: You went through great pains to only calculate the necessary "corners". Would it not be much easier to calculate all four possible corners and let the min and max functions deal with the duplicates in case the i1 or i2 range is a singleton? The result should be the same (if I did not forget about a corner case) and it would be easier to follow.

What kind of testing did you run on your side? I kicked off a tier1 through 5 and will keep you posted on the results.

You still have a title mismatch between the issue and the PR (the PR is missing "C2:").

@openjdk openjdk bot added rfr Pull request is ready for review and removed rfr Pull request is ready for review labels Aug 7, 2025
@mhaessig
Copy link
Contributor

mhaessig commented Aug 8, 2025

Testing is all green.

@ichttt ichttt changed the title 8364766: Improve Value() of DivI and DivL for non-constant inputs 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs Aug 10, 2025
@ichttt
Copy link
Contributor Author

ichttt commented Aug 10, 2025

Thanks for the fast review! The main reason for all the if cases is that min_int / (-1) is undefined behavior in C++, as it overflows. All code has to be careful that this special case can't happen in C++ code, and that's the main motivation behind all the ifs. I've added a comment that describes that.
Otherwise, you would be right: Redudant calculations are no problem, min and max would take care of that.

Regarding testing: I only ran tier1 tests on my machine and GHA

@mhaessig
Copy link
Contributor

mhaessig commented Aug 11, 2025

The main reason for all the if cases is that min_int / (-1) is undefined behavior in C++, as it overflows. All code has to be careful that this special case can't happen in C++ code, and that's the main motivation behind all the ifs. I've added a comment that describes that.
Otherwise, you would be right: Redudant calculations are no problem, min and max would take care of that.

Then I would suggest restructuring the code to express that intent. Reading it currently, gives me the impression that you care about handling the four corners carefully when you really only care about the UB case. The following pseudocode uses ifs to only avoid the corner case. It's only slightly different from your version, but I find it a bit clearer to guess its intent. What do you think?

if i1.lo == MIN_INT && (i2.lo == -1 || i2.hi  == -1) {
    new_lo = MIN_INT
    if i1.hi == MIN_INT { // is_con()
        new_hi = MAX_INT // (MIN_INT + 1) / -1
        return (new_lo, new_hi) // This is already the entire domain, so we can return early
    }
    if i2.lo != i2.hi {
        corner i1.lo, (i2.lo == -1 ? i2.hi : i2.lo) // corner is just shorthand for setting new_lo and new_hi
    }
} else {
    // i1.lo > MIN_INT
    corner i1.lo, i2.lo
    corner i1.lo, i2.hi
}
// i1.hi > MIN_INT because of early return
corner i1.hi, i2.lo
corner i1.hi, i2.hi

Copy link
Contributor

@benoitmaillard benoitmaillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this change, I think this an important optimization opportunity that was previously missing. The code is very clear. I only have one nit.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice idea, thanks for the work!

I started reading through a part of it and have a few questions / suggestions.


A more general suggestion (could also be a future RFE):
You could also use the unsigned bounds here. And you could also address UDIvI/L.

Now that we have unsigned bounds we could use them: #17508

You can read up in type.hpp, that an integer type has one or two simple intervals.

   708  * 4. Either _lo == jint(_ulo) and _hi == jint(_uhi), or each element of a
   709  *   TypeInt lies in either interval [_lo, jint(_uhi)] or [jint(_ulo), _hi]
   710  *   (note that these intervals are disjoint in this case).

I think with this, you could do an even more powerful optimization.


Also: where ever I see these optimizations that work on ranges, and not just constants: we should do some more rigorous testing on those resulting ranges.

I see that you already have some concrete examples. It would be good to extend those with a completely randomized version. See for inspiration:

https://github.com/openjdk/jdk/pull/25254/files#diff-0e3d89ac8cf0548b69d9bdb0859380bc31de0a772fa7ff211f446a4a5abd4197R220-R248

Comment on lines 507 to 508
template<typename IntegerType>
static const IntegerType* compute_generic_div_type(const IntegerType* i1, const IntegerType* i2, int widen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the generic in the name? The template already suggests that it can be used for different types, right?

Also: I'm wondering if we can somehow extend this for UDivI and UDIvL.
I suppose you would have to use the _ulo and _uhi instead of _lo and _hi.

I'm not saying this all has to be done in this PR, but we could at least anticipate the extension to unsigned division.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted it to compute_signed_div_type.
I would not worry about unsigned division now and would leave this for a later RFE. Would you perhaps create a issue for that? I have no permission for the JBS

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 22, 2025

@ichttt This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@mhaessig
Copy link
Contributor

mhaessig commented Oct 1, 2025

@ichttt, are you still working on this? 🙂

@ichttt
Copy link
Contributor Author

ichttt commented Oct 19, 2025

Thanks for all your reviews!
I've adjusted the tests and comments as per your suggestions. Please review again.
And sorry for the long delay @mhaessig !

Copy link
Member

@SirYwell SirYwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few comments

Comment on lines 647 to 651
// Either input is BOTTOM ==> the result is the local BOTTOM
const Type *bot = bottom_type();
if( (t1 == bot) || (t2 == bot) ||
(t1 == Type::BOTTOM) || (t2 == Type::BOTTOM) )
return bot;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed - and in cases where one side is the local bottom (i.e., TypeInt::INT) and the other is more restricted, the result should even more precise after removing. Could you also add tests for such cases? For example dividing TypeInt::INT by some interval with a lower bound of 2, the resulting range can be narrowed. Similarly, dividing some small interval [lo, hi] by TypeInt::INT should result in a similar interval with bounds adjusted to deal with sign changes. If I didn't miss something, your code should already be able to deal with this, it's just this early return here preventing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are correct. The only part where I am not sure is if every instance where i1/i2 can be Type::BOTTOM, i1/i2 can be cast to TypeInt.
Can someone please confirm the removal is safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type::BOTTOM should not happen, since we are in a method DivI/L and thus know that the inputs should be some particular type. I think removing this check is fine, since you are doing t1->is_int() below, which would assert for t1 == Type::BOTTOM.

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing all the comments and no worries about the delay.

This looks good to me now. I just kicked off testing and will report back with the results.

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, testing revealed a problem in one of our internal tests, that I am unable to share. The Test is run with -Xomp -XX:-TieredCompilation and the error is

STDOUT:
 485  ConvI2L  === _ 486  [[ 483 ]]  #long:minint..maxint, 0u..maxulong, widen: 3 !orig=153 
 152  ConL  === 0  [[ 169 164 483 568 ]]  #long:minlong
 489  IfTrue  === 488  [[ 483 490 ]] #1 !orig=161 
 483  DivL  === 489 152 485  [[ 482 504 ]]  !orig=169
told = long:maxint..2305843009213693952, widen: 3
tnew = long:minlong..9007199254740992, 0u..maxulong, widen: 3
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/src/jdk-pr-26143/open/src/hotspot/share/opto/phaseX.cpp:2763), pid=108119, tid=108147
#  fatal error: Not monotonic
#
# JRE version: Java(TM) SE Runtime Environment (26.0) (fastdebug build 26-internal-mhassig.open)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 26-internal.open, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x185c479]  PhaseCCP::verify_type(Node*, Type const*, Type const*)+0x169

Stack: [0x00007f7e669eb000,0x00007f7e66aeb000],  sp=0x00007f7e66ae5f10,  free space=1003k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x185c479]  PhaseCCP::verify_type(Node*, Type const*, Type const*)+0x169  (phaseX.cpp:2763)
V  [libjvm.so+0x1864f0d]  PhaseCCP::analyze()+0x3bd  (phaseX.cpp:2806)
V  [libjvm.so+0xb86554]  Compile::Optimize()+0x964  (compile.cpp:2489)
V  [libjvm.so+0xb89a63]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x2023  (compile.cpp:860)
V  [libjvm.so+0x9a3a33]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x4a3  (c2compiler.cpp:147)
V  [libjvm.so+0xb98f70]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0x780  (compileBroker.cpp:2345)
V  [libjvm.so+0xb9a7d0]  CompileBroker::compiler_thread_loop()+0x530  (compileBroker.cpp:1989)
V  [libjvm.so+0x10f2c4b]  JavaThread::thread_main_inner()+0x13b  (javaThread.cpp:771)
V  [libjvm.so+0x1b63ce6]  Thread::call_run()+0xb6  (thread.cpp:243)
V  [libjvm.so+0x17d1188]  thread_native_entry(Thread*)+0x128  (os_linux.cpp:883)

@ichttt
Copy link
Contributor Author

ichttt commented Oct 25, 2025

Yeah this sucks, as it stands I do not have enough info to debug this...
I noticed that the new types looks wrong if I read the input nodes correctly. It should be min_long / -2, but it is min_long / -1024. I can however not reproduce this locally:
grafik
I have the same input nodes in this example, but the correct type for the div node here.
I've starred at the code now for a bit and I am still not sure what could have caused this. And I am afraid I can't debug without the test case...

@ichttt
Copy link
Contributor Author

ichttt commented Oct 30, 2025

@mhaessig I've added some new asserts to try and detect where it went wrong and merged the latest upstream. Can you run the failing test again please?

@mhaessig
Copy link
Contributor

Thank you @ichttt, I will kick off a run. I have been trying to extract a reproducer for you, but was not able to reproduce the failure again. Perhaps the asserts will help.

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing tier1 up to tier6 passed. If you move the test out of the irTest directory, this will be good to go.

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your work, @ichttt. Looks good to me now.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @ichttt ! I like all your comments, and thanks for all the test cases, including the randomized ones!

I just have a few minor suggestions :)

Comment on lines 62 to 66
@IR(failOn = {IRNode.DIV_I, IRNode.DIV_L, IRNode.URSHIFT_I, IRNode.URSHIFT_L, IRNode.RSHIFT_I, IRNode.RSHIFT_L, IRNode.MUL_I, IRNode.MUL_L, IRNode.ADD_I, IRNode.ADD_L, IRNode.SUB_I, IRNode.SUB_L, IRNode.AND_I, IRNode.AND_L})
public int testIntConstantFoldingSpecialCase() {
// All constants available during parsing
return getIntConstant(Integer.MIN_VALUE) / getIntConstant(-1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add an IR rule that the div is still present after parsing? It seems you have already had the possible issue that javac optimized the div away, right? So this would ensure the optimization really does happen in C2, and that you are checking for the right kinds of nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider doing the same in other places in this file ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is already optimized during local GVN during parsing, so after parsing, these nodes are already gone

Comment on lines 282 to 286
//@IR(failOn = {IRNode.DIV_I, IRNode.DIV_L, IRNode.URSHIFT_I, IRNode.URSHIFT_L, IRNode.RSHIFT_I, IRNode.RSHIFT_L, IRNode.MUL_I, IRNode.MUL_L, IRNode.ADD_I, IRNode.ADD_L, IRNode.SUB_I, IRNode.SUB_L, IRNode.AND_I, IRNode.AND_L})
// This results in a series of nodes due to DivLNode::Ideal and in particular transform_long_divide, which operates on non-constant divisors.
// transform_long_divide splits up the division into multiple other nodes, such as MulHiLNode, which does not have a good Value() implemantion.
// When JDK-8366815 is fixed, these rules should be reenabled
// Alternatively, a better MulHiLNode::Value() implemantion should also lead to constant folding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have some temporary IR rule that now passes, but fails once JDK-8366815 is fixed? Otherwise, I'm afraid we will miss these comments here, and they will never be cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same elsewhere in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is not really possible for this method. Depending on the random constants that have been chosen, a different sequence of nodes will be emitted by transform_long_divide, and sometimes a MulHiLNode will not be emitted and the expression may constant fold after all.
So I fear we just have to remember this, but as #27886 looks pretty close to being done as well, I guess this will not be too hard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it some more, we can add IR validation to testLongRange, though. This one is not dependent on random constants, so it should always work. That way, we should notice the other rule as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

8 participants