Skip to content

Conversation

@SirYwell
Copy link
Member

@SirYwell SirYwell commented Sep 21, 2025

Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.

Please review :)


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8367967: C2: "fatal error: Not monotonic" with Mod nodes (Bug - P3)

Reviewers

Contributors

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27408/head:pull/27408
$ git checkout pull/27408

Update a local copy of the PR:
$ git checkout pull/27408
$ git pull https://git.openjdk.org/jdk.git pull/27408/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27408

View PR using the GUI difftool:
$ git pr show -t 27408

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27408.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 21, 2025

👋 Welcome back hgreule! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 21, 2025

@SirYwell This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8367967: C2: "fatal error: Not monotonic" with Mod nodes

Co-authored-by: Christian Hagedorn <[email protected]>
Reviewed-by: bmaillard, vlivanov, chagedorn, shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 130 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title 8367967 8367967: C2: "fatal error: Not monotonic" with Mod nodes Sep 21, 2025
@SirYwell
Copy link
Member Author

/contributor add @chhagedorn

Thanks for the test case!

@openjdk
Copy link

openjdk bot commented Sep 21, 2025

@SirYwell The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk
Copy link

openjdk bot commented Sep 21, 2025

@SirYwell
Contributor Christian Hagedorn <[email protected]> successfully added.

@SirYwell SirYwell marked this pull request as ready for review September 21, 2025 06:14
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 21, 2025
@mlbridge
Copy link

mlbridge bot commented Sep 21, 2025

Webrevs

@MBaesken
Copy link
Member

I'll test this in our CI to see if this fixes the linux aarch64 issues (observed when running Test java/foreign/TestUpcallStress.java ) .

@MBaesken
Copy link
Member

Btw. why do we get always zero size replay files when running into the issue ?

# Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_aarch64-dbg/jdk/src/hotspot/share/opto/phaseX.cpp:2763), pid=1089937, tid=1089972
# fatal error: Not monotonic

Is it another bug of the replay file generation or a known limitation ?

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me, thanks for the fix and the credit for the test!

I'll give it a spin in our testing as well.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 22, 2025
@chhagedorn
Copy link
Member

Btw. why do we get always zero size replay files when running into the issue ?

# Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_aarch64-dbg/jdk/src/hotspot/share/opto/phaseX.cpp:2763), pid=1089937, tid=1089972
# fatal error: Not monotonic

Is it another bug of the replay file generation or a known limitation ?

We've encountered empty replay files before which could be traced back to a timeout in error reporting due to threads being stuck. We filed JDK-8297588 for it but it's not fixed, yet. I did a closer investigation back there (see summary). You might be hitting the same issue.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I was about to ask what would be the result of 0 mod 0 then, but I see it is also covered: we return TOP on any X mod 0 early on.

Copy link
Contributor

@benoitmaillard benoitmaillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I only have one minor comment.

* @test
* @bug 8367967
* @summary Ensure ModI/LNode::Value is monotonic with potential divison by 0
* @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:CompileOnly=compiler.c2.TestModValueMonotonic::test*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably add another @run main ... without flags to potentially catch other things in the future

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 22, 2025
@SirYwell
Copy link
Member Author

I added the suggestion from @benoitmaillard and fixed a typo in the test summary. Please let me know when the test results are in :)

@@ -0,0 +1,66 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing: You could move this test to compiler/ccp which fits better than the generic c2 folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

if (t1 == Type::TOP) { return Type::TOP; }
if (t2 == Type::TOP) { return Type::TOP; }

// Mod by zero? Throw exception at runtime!
Copy link
Contributor

@iwanowww iwanowww Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a bit confusing. It's not the node itself which produces the exception, but a dominating zero check (inserted during parsing). So, if a divisor becomes 0, it means the node is effectively dead and can go away.

Also, the node should go away anyway as part of CFG pruning of dead branches when corresponding guard goes away.

BTW if there are cases when control is not eliminated, it may irrevocably break the IR causing crashes down the road (take a look at JDK-8154831 as an example). So, maybe it's safer to just rely on dead control pruning to eliminate effectively dead ModI/ModL nodes and assert that there are no effectively dead ModI/ModL nodes present after GVN pass is over.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment comes from the original code before my change in #25254, where that path also returned POS but that wasn't monotonic with my changes anymore.

So, if a divisor becomes 0, it means the node is effectively dead and can go away.

I think this check mostly comes down to CCP. We need to return something for a zero divisor, and that something has to be monotonic with subsequent wider inputs.

If you agree with that observation, I can change the comment to better reflect what's going on, e.g., Mod by zero can be observed in PhaseCCP, return TOP to ensure monotonic results (I'm open for other suggestions).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications. I thought about it for some time, but as things work now, I don't see a better alternative except just ignoring 0 divisor case. So, please, proceed with the fix as it is now.

Alternatively, to improve robustness, a dead ModI/ModL can kill dependent control akin to what Roland did for Type nodes with JDK-8349479.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a better alternative except just ignoring 0 divisor case

That probably also works. It seems that for DivI/L, we already ignore this case as well.

The question is: What is better when the zero check is not folded but we observe zero for the divisor: Having top to possibly corrupt the graph or just possibly risking miscompilation/div by zero crashes at runtime when the zero check is really off - but not folding the zero check does not necessarily mean it's wrong at runtime. The former is probably easy to catch when it happens while the latter seems more robost but when the zero check is off, it's probably harder to detect/trace back.

Alternatively, to improve robustness, a dead ModI/ModL can kill dependent control akin to what Roland did for Type nodes with JDK-8349479.

Could be an option. We then should probably also extend it to Div nodes. Might be worth to investigate separately.

@MBaesken
Copy link
Member

I'll test this in our CI to see if this fixes the linux aarch64 issues (observed when running Test java/foreign/TestUpcallStress.java ) .

Unfortunately we still see an assert in the test java/foreign/TestUpcallStress on Linux aarch64 .
But this time it is not the 'old' one but

# assert(oopDesc::is_oop(obj)) failed: not an oop: 0x0000000000000001

Maybe it is unrelated, not sure .

if (t1 == Type::TOP) { return Type::TOP; }
if (t2 == Type::TOP) { return Type::TOP; }

// Mod by zero? Throw exception at runtime!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications. I thought about it for some time, but as things work now, I don't see a better alternative except just ignoring 0 divisor case. So, please, proceed with the fix as it is now.

Alternatively, to improve robustness, a dead ModI/ModL can kill dependent control akin to what Roland did for Type nodes with JDK-8349479.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 25, 2025
@SirYwell
Copy link
Member Author

Unfortunately we still see an assert in the test java/foreign/TestUpcallStress on Linux aarch64 . But this time it is not the 'old' one but

# assert(oopDesc::is_oop(obj)) failed: not an oop: 0x0000000000000001

Maybe it is unrelated, not sure .

@MBaesken this looks rather unrelated, but hard to tell without more output.

@chhagedorn did your tests came back green?

@chhagedorn
Copy link
Member

chhagedorn commented Sep 26, 2025

Testing looks good!

I also left a comment about ignoring the zero divisor case. It's an interesting thought to just ignore/remove it. Anyway, the current patch just fixes the current situation and does not make it worse. So, I agree with it but if you want to switch to the ignoring case, I'm also fine. In the latter case, I won't be able to review it anymore since I will be on vacation next week (assuming we also wait for Vladimir's additional input about it). But you would have my implicit approval :-)

@MBaesken
Copy link
Member

@MBaesken this looks rather unrelated, but hard to tell without more output.

Should I open a new JBS issue for it? Probably it is something else and we cannot address it in this PR .

@SirYwell
Copy link
Member Author

@MBaesken this looks rather unrelated, but hard to tell without more output.

Should I open a new JBS issue for it? Probably it is something else and we cannot address it in this PR .

Yes, please. I'll integrate this change later today if there is no objection.

@MBaesken
Copy link
Member

Yes, please. I'll integrate this change later today if there is no objection.

There is already https://bugs.openjdk.org/browse/JDK-8360595 ; I added the info about our assert there .
(so far this existing JBS issue is about ShenandoahGC but we see it also with G1GC).

@SirYwell
Copy link
Member Author

Thanks everyone for the reviews :)

/integrate

@openjdk
Copy link

openjdk bot commented Sep 29, 2025

Going to push as commit 59e76af.
Since your change was applied there have been 163 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 29, 2025
@openjdk openjdk bot closed this Sep 29, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 29, 2025
@openjdk
Copy link

openjdk bot commented Sep 29, 2025

@SirYwell Pushed as commit 59e76af.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@SirYwell SirYwell deleted the fix/mod-not-monotonic branch September 29, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler [email protected] integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

6 participants