-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8367967: C2: "fatal error: Not monotonic" with Mod nodes #27408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back hgreule! A progress list of the required criteria for merging this PR into |
|
@SirYwell This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 130 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
/contributor add @chhagedorn Thanks for the test case! |
|
@SirYwell |
Webrevs
|
|
I'll test this in our CI to see if this fixes the linux aarch64 issues (observed when running Test java/foreign/TestUpcallStress.java ) . |
|
Btw. why do we get always zero size replay files when running into the issue ? Is it another bug of the replay file generation or a known limitation ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good to me, thanks for the fix and the credit for the test!
I'll give it a spin in our testing as well.
We've encountered empty replay files before which could be traced back to a timeout in error reporting due to threads being stuck. We filed JDK-8297588 for it but it's not fixed, yet. I did a closer investigation back there (see summary). You might be hitting the same issue. |
shipilev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I was about to ask what would be the result of 0 mod 0 then, but I see it is also covered: we return TOP on any X mod 0 early on.
benoitmaillard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I only have one minor comment.
| * @test | ||
| * @bug 8367967 | ||
| * @summary Ensure ModI/LNode::Value is monotonic with potential divison by 0 | ||
| * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:CompileOnly=compiler.c2.TestModValueMonotonic::test* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably add another @run main ... without flags to potentially catch other things in the future
|
I added the suggestion from @benoitmaillard and fixed a typo in the test summary. Please let me know when the test results are in :) |
| @@ -0,0 +1,66 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing: You could move this test to compiler/ccp which fits better than the generic c2 folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| if (t1 == Type::TOP) { return Type::TOP; } | ||
| if (t2 == Type::TOP) { return Type::TOP; } | ||
|
|
||
| // Mod by zero? Throw exception at runtime! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is a bit confusing. It's not the node itself which produces the exception, but a dominating zero check (inserted during parsing). So, if a divisor becomes 0, it means the node is effectively dead and can go away.
Also, the node should go away anyway as part of CFG pruning of dead branches when corresponding guard goes away.
BTW if there are cases when control is not eliminated, it may irrevocably break the IR causing crashes down the road (take a look at JDK-8154831 as an example). So, maybe it's safer to just rely on dead control pruning to eliminate effectively dead ModI/ModL nodes and assert that there are no effectively dead ModI/ModL nodes present after GVN pass is over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment comes from the original code before my change in #25254, where that path also returned POS but that wasn't monotonic with my changes anymore.
So, if a divisor becomes 0, it means the node is effectively dead and can go away.
I think this check mostly comes down to CCP. We need to return something for a zero divisor, and that something has to be monotonic with subsequent wider inputs.
If you agree with that observation, I can change the comment to better reflect what's going on, e.g., Mod by zero can be observed in PhaseCCP, return TOP to ensure monotonic results (I'm open for other suggestions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarifications. I thought about it for some time, but as things work now, I don't see a better alternative except just ignoring 0 divisor case. So, please, proceed with the fix as it is now.
Alternatively, to improve robustness, a dead ModI/ModL can kill dependent control akin to what Roland did for Type nodes with JDK-8349479.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a better alternative except just ignoring 0 divisor case
That probably also works. It seems that for DivI/L, we already ignore this case as well.
The question is: What is better when the zero check is not folded but we observe zero for the divisor: Having top to possibly corrupt the graph or just possibly risking miscompilation/div by zero crashes at runtime when the zero check is really off - but not folding the zero check does not necessarily mean it's wrong at runtime. The former is probably easy to catch when it happens while the latter seems more robost but when the zero check is off, it's probably harder to detect/trace back.
Alternatively, to improve robustness, a dead ModI/ModL can kill dependent control akin to what Roland did for Type nodes with JDK-8349479.
Could be an option. We then should probably also extend it to Div nodes. Might be worth to investigate separately.
Unfortunately we still see an assert in the test java/foreign/TestUpcallStress on Linux aarch64 .
Maybe it is unrelated, not sure . |
| if (t1 == Type::TOP) { return Type::TOP; } | ||
| if (t2 == Type::TOP) { return Type::TOP; } | ||
|
|
||
| // Mod by zero? Throw exception at runtime! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarifications. I thought about it for some time, but as things work now, I don't see a better alternative except just ignoring 0 divisor case. So, please, proceed with the fix as it is now.
Alternatively, to improve robustness, a dead ModI/ModL can kill dependent control akin to what Roland did for Type nodes with JDK-8349479.
@MBaesken this looks rather unrelated, but hard to tell without more output. @chhagedorn did your tests came back green? |
|
Testing looks good! I also left a comment about ignoring the zero divisor case. It's an interesting thought to just ignore/remove it. Anyway, the current patch just fixes the current situation and does not make it worse. So, I agree with it but if you want to switch to the ignoring case, I'm also fine. In the latter case, I won't be able to review it anymore since I will be on vacation next week (assuming we also wait for Vladimir's additional input about it). But you would have my implicit approval :-) |
Should I open a new JBS issue for it? Probably it is something else and we cannot address it in this PR . |
Yes, please. I'll integrate this change later today if there is no objection. |
There is already https://bugs.openjdk.org/browse/JDK-8360595 ; I added the info about our assert there . |
|
Thanks everyone for the reviews :) /integrate |
|
Going to push as commit 59e76af.
Your commit was automatically rebased without conflicts. |
Generally, we shouldn't return a wider type (ZERO) if there is a later case that would return a more narrow type (TOP) for the same input types. If the inputs are widened and the first case doesn't match anymore but the later one still does, the result is not monotonic with the previous result.
Please review :)
Progress
Issue
Reviewers
Contributors
<[email protected]>Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27408/head:pull/27408$ git checkout pull/27408Update a local copy of the PR:
$ git checkout pull/27408$ git pull https://git.openjdk.org/jdk.git pull/27408/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27408View PR using the GUI difftool:
$ git pr show -t 27408Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27408.diff
Using Webrev
Link to Webrev Comment