-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8352185: Shenandoah: Invalid logic for remembered set verification #24092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rly rebuilt through marking.
|
👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into |
|
@pengxiaolong This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 141 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kdnilsen, @earthling-amzn, @ysramakrishna) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
@pengxiaolong The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
|
/issue JDK-8352185 |
|
@pengxiaolong This issue is referenced in the PR title - it will now be updated. |
| assert(!heap->has_forwarded_objects(), "No forwarded objects on this path"); | ||
|
|
||
| if (heap->mode()->is_generational()) { | ||
| if (_generation->is_young()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not want to change this code. We only swap remembered set for young-gen because gen will scan the remset and reconstruct it with more updated information. For a global GC, we do not scan the remset and do not reconstruct it. If we swap here, we will lose the information that is currently within the remset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for for explanation, I have been reading and trying the understand how the remembered set works in GenShen. I wasn't sure whether this is actually right.
In generational mode, if the GC cycle is global, the read table is already cleaned during reset phase, so remembered set verification from verify_before_concmark and verify_before_update_refs shouldn't work properly, I think the remembered set verification before mark and update references should be disabled, what do you think? Meanwhile, there is no need to clean read table during global cycle in generational mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So we will always swap card tables, but we'll do it after verify-before-mark. To clarify the intention, after we swap card table, the write-table is all clean, and the read table holds whatever had been gathered prior to the start of GC. Young and bootstrap collection will update the write card table as a side effect of remembered set scanning. Global collection will update the card table as a side effect of global marking of old objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave a comment to this effect (along the lines of Kelvin's last comment) here. Did we measure the impact of this change on performance? In particular it would seem that the number of dirty old cards might now reduce after a global gc compared to before this change.
Ideally, this would be a change that would go in on its own. (There is no impact on correctness, since in the absence of this change, the dirty card set is an over-approximation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit hard to measure the impact on performance I think, but given the rem-set is more accurate, there shouldn't be any performance regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting looking to see if normal perf measures showed any improvements. E.g. if you ran say SPECjbb and compared the remset scan times for the minor GC's that followed global collections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run h2 benchmark, here is the remembered set scan times after a global GC, it does seem to improve remembered set scan time in this case:
PR version:
[2025-03-21T07:35:41.801+0000][10.292s][19715][info ][gc ] GC(6) Concurrent remembered set scanning 13.069ms
[2025-03-21T07:35:48.088+0000][16.579s][19715][info ][gc ] GC(9) Concurrent remembered set scanning 5.537ms
[2025-03-21T07:35:56.610+0000][25.101s][19715][info ][gc ] GC(14) Concurrent remembered set scanning 6.186ms
[2025-03-21T07:36:03.967+0000][32.459s][19715][info ][gc ] GC(18) Concurrent remembered set scanning 9.562ms
[2025-03-21T07:36:11.234+0000][39.725s][19715][info ][gc ] GC(22) Concurrent remembered set scanning 2.591ms
[2025-03-21T07:36:17.303+0000][45.794s][19715][info ][gc ] GC(25) Concurrent remembered set scanning 0.999ms
[2025-03-21T07:36:25.647+0000][54.139s][19715][info ][gc ] GC(30) Concurrent remembered set scanning 1.665ms
[2025-03-21T07:36:32.790+0000][61.281s][19715][info ][gc ] GC(33) Concurrent remembered set scanning 2.851ms
[2025-03-21T07:36:40.241+0000][68.732s][19715][info ][gc ] GC(36) Concurrent remembered set scanning 0.716ms
[2025-03-21T07:36:47.440+0000][75.931s][19715][info ][gc ] GC(39) Concurrent remembered set scanning 1.932ms
master:
[2025-03-21T07:34:04.978+0000][10.765s][17923][info ][gc ] GC(6) Concurrent remembered set scanning 22.813ms
[2025-03-21T07:34:11.250+0000][17.038s][17923][info ][gc ] GC(9) Concurrent remembered set scanning 14.457ms
[2025-03-21T07:34:18.692+0000][24.480s][17923][info ][gc ] GC(14) Concurrent remembered set scanning 4.972ms
[2025-03-21T07:34:26.033+0000][31.820s][17923][info ][gc ] GC(18) Concurrent remembered set scanning 9.134ms
[2025-03-21T07:34:34.416+0000][40.203s][17923][info ][gc ] GC(22) Concurrent remembered set scanning 3.655ms
[2025-03-21T07:34:42.180+0000][47.967s][17923][info ][gc ] GC(26) Concurrent remembered set scanning 3.253ms
[2025-03-21T07:34:49.371+0000][55.168s][17923][info ][gc ] GC(29) Concurrent remembered set scanning 1.615ms
[2025-03-21T07:34:56.592+0000][62.396s][17923][info ][gc ] GC(32) Concurrent remembered set scanning 1.570ms
[2025-03-21T07:35:03.766+0000][69.575s][17923][info ][gc ] GC(35) Concurrent remembered set scanning 1.040ms
[2025-03-21T07:35:10.941+0000][76.753s][17923][info ][gc ] GC(38) Concurrent remembered set scanning 1.947ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be as intended earlier, leave a documentation comment between lines 697 & 698 along the lines of Kelvin's comment:
// After we swap card table below, the write-table is all clean, and the read table holds
// cards dirty prior to the start of GC. Young and bootstrap collection will update
// the write card table as a side effect of remembered set scanning. Global collection will
// update the card table as a side effect of global marking of old objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I added comments as you suggested but forgot to push.Now the comment has been added.
| shenandoah_assert_generations_reconciled(); | ||
| if (_heap->old_generation()->is_mark_complete() || _heap->gc_generation()->is_global()) { | ||
| return _heap->complete_marking_context(); | ||
| if (_heap->old_generation()->is_mark_complete()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that this is an global GC, we know that old-generation()->is_mark_complete() by virtue of the current program counter, I assume. (We would only ask for the old marking context if global marking were already finished.) In the case that we are doing a global GC cycle, I'm guessing that we do not set is-mark-complete for the old generation. So that's why I believe you need to keep the condition as originally written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is global GC in generational mode, old-generation()->is_mark_complete() is always false after reset and before mark because bitmaps of the entire heap including old gen has been reset during concurrent reset phase, so old mark is not finished in when verify_before_concmark is called. The marking context return by this method is only used for remembered set verification, but as I pointed out in the first comments, we shouldn't do remembered set verification in such case because the rem-set read table is already cleaned/stale.
…urrent global GC in generational mode
Webrevs
|
kdnilsen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we confirm that this addresses JBS issue with further testing before integration?
| assert(!heap->has_forwarded_objects(), "No forwarded objects on this path"); | ||
|
|
||
| if (heap->mode()->is_generational()) { | ||
| if (_generation->is_young()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So we will always swap card tables, but we'll do it after verify-before-mark. To clarify the intention, after we swap card table, the write-table is all clean, and the read table holds whatever had been gathered prior to the start of GC. Young and bootstrap collection will update the write card table as a side effect of remembered set scanning. Global collection will update the card table as a side effect of global marking of old objects.
|
Can you sync w/master so GHA (& problem lists) is more uptodate. Thanks! |
Done, now waiting for GHS to rerun, thanks. |
|
@pengxiaolong this pull request can not be integrated into git checkout JDK-8345399-v3
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
earthling-amzn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of nits.
kdnilsen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refinements. LGTM.
earthling-amzn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think this will fix https://bugs.openjdk.org/browse/JDK-8345399, should we add it as an issue to this PR?
It will likely fix the JDK-8345399, I mentioned it in JBS. will see if I can get a ppc64le hardware to verify this week. |
ysramakrishna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change can be pushed as is, but I am not convinced that the verification can't be tightened when old marking information is missing as long as we have a valid TAMS and there are no unparsable objects (which should only happen when coalease-&-fill has been interrupted, leaving dead objects with x-gen pointers that would cause false positives or upon class unloading when dead objects may end up being unparsable). The current condition of skipping verification when old bit maps are cleared seems to miss verification opportunities that would be valid after a completed C&F.
Left some related comments, but I won't hold back this PR further. The tightening can be done subsequently (and I am happy to pick that up afterwards as needed).
Thanks for your patience with my tardy and long-winded reviews! :-)
| assert(!heap->has_forwarded_objects(), "No forwarded objects on this path"); | ||
|
|
||
| if (heap->mode()->is_generational()) { | ||
| if (_generation->is_young()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be as intended earlier, leave a documentation comment between lines 697 & 698 along the lines of Kelvin's comment:
// After we swap card table below, the write-table is all clean, and the read table holds
// cards dirty prior to the start of GC. Young and bootstrap collection will update
// the write card table as a side effect of remembered set scanning. Global collection will
// update the card table as a side effect of global marking of old objects.
| verify_at_safepoint( | ||
| VerifyRememberedSet verify_remembered_set = _verify_remembered_before_marking; | ||
| if (_heap->mode()->is_generational() && | ||
| !_heap->old_generation()->is_mark_complete()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not the following stronger condition to skip verification? My sense is that the only case we cannot verify is if we do not have marking info and old gen has been left "unparsable" (because of an incomplete/interrupted C&F which may have us look at dead objects -- that are either unparsable because of class unloading, or are parsable but hold cross-gen pointers). In all other cases, we can do a safe and complete verification.
is_generational() && !old_gen->is_mark_complete() && !old_gen->is_parsable()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not need to worry about it, old_gen becomes not parsable in class unloading phase of a global concurrent GC, marking is already done for the global including old gen, there should be always complete marking for old when old gen is not parsable.
| void ShenandoahVerifier::verify_before_update_refs() { | ||
| VerifyRememberedSet verify_remembered_set = _verify_remembered_before_updating_references; | ||
| if (_heap->mode()->is_generational() && | ||
| !_heap->old_generation()->is_mark_complete()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re stronger condition as previous one above.
ysramakrishna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
|
I have reproduced the bug https://bugs.openjdk.org/browse/JDK-8345399 on ppc64le hardware with tip, crash happens in a young cycle after a full GC, which is one of the problems I'm trying to fix in this PR: I'll run the same test to confirm whether this PR fix the bug. |
|
/issue add JDK-8345399 |
|
@pengxiaolong |
|
/integrate |
|
@pengxiaolong |
|
/sponsor |
|
Going to push as commit 4d1de46.
Your commit was automatically rebased without conflicts. |
|
@ysramakrishna @pengxiaolong Pushed as commit 4d1de46. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
There are some scenarios in which GenShen may have improper remembered set verification logic:
In the end of ShenandoahFullGC, it resets bitmaps for the entire heap w/o resetting marking context to be incomplete, but ShenandoahVerifier has code like below to get a complete old marking context for remembered set verification
For the concurrent young GC cycles after a full GC, the old marking context used for remembered set verification is stale, and may cause unexpected result.
For the impl of
ShenandoahVerifier::get_marking_context_for_oldmentioned above, it always return a marking context for global GC, but marking bitmaps is already reset before before init-mark,ShenandoahVerifier::help_verify_region_rem_setalways skip verification in this case.ShenandoahConcurrentGC always clean remembered set read table, but only swap read/write table when gc generation is young, this issue causes remembered set verification before init-mark to use a completely clean remembered set, but it is covered by issue 2.
After concurrent young cycle evacuates objects from a young region, it update refs using marking bitmaps from marking context, therefore it won't update references of dead old objects(is_marked(obj) is false: obj is not marking strong/weak and it is below tams). In this case, if the next cycle if global concurrent GC, remembered set can't be verified before init-mark because of the dead pointers.
Solution
Test
make test TEST=hotspot_gc_shenandoahProgress
Issues
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24092/head:pull/24092$ git checkout pull/24092Update a local copy of the PR:
$ git checkout pull/24092$ git pull https://git.openjdk.org/jdk.git pull/24092/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 24092View PR using the GUI difftool:
$ git pr show -t 24092Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24092.diff
Using Webrev
Link to Webrev Comment