8212107: VMThread issues and cleanup #228

robehn · 2020-09-17T19:58:02Z

We simplify the vmThread by removing the queue and refactor the the main loop.
This solves the issues listed:

It can create an extra safepoint directly after a safepoint.
It's not safe for a non-JavaThread to add safepoint to queue while GC do oops do.
The exposure of the vm operation is dangerous if it's a handshake.
The code is a hornets nest with the repetition of checks and branches

Passes t1-8, and a benchmark run.

If you want a smaller diff the commits contains the incremental progress and each commit passed t1.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8212107: VMThread issues and cleanup

Reviewers

Aleksey Shipilev (@shipilev - Reviewer) ⚠️ Review applies to 3c0395b
Daniel D. Daugherty (@dcubed-ojdk - Reviewer)
Coleen Phillimore (@coleenp - Reviewer) ⚠️ Review applies to e49178a
David Holmes (@dholmes-ora - Reviewer)
Claes Redestad (@cl4es - Reviewer)

Download

$ git fetch https://git.openjdk.java.net/jdk pull/228/head:pull/228
$ git checkout pull/228

bridgekeeper · 2020-09-17T19:59:54Z

👋 Welcome back rehn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2020-09-17T20:00:42Z

@robehn The following label will be automatically applied to this pull request: hotspot.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label (add|remove) "label" command.

mlbridge · 2020-09-17T20:34:18Z

Webrevs

shipilev

I find juggling the _next_vm_operation a bit confusing at the first glance, but that seems superficially okay.

src/hotspot/share/runtime/mutexLocker.cpp

src/hotspot/share/runtime/mutexLocker.hpp

src/hotspot/share/runtime/thread.hpp

src/hotspot/share/runtime/vmThread.cpp

src/hotspot/share/runtime/vmThread.hpp

dcubed-ojdk

I probably should have waited to review this after all of Aleksey's
comments were resolved. I'm gonna have to take a look at
src/hotspot/share/runtime/vmThread.cpp again via a webrev;
it's just too hard to review via this snippet UI.

I'll re-review after all of Aleksey's changes are done.

src/hotspot/share/runtime/vmThread.hpp

src/hotspot/share/runtime/vmThread.cpp

mlbridge · 2020-09-21T03:23:37Z

Mailing list message from David Holmes on hotspot-dev:

Hi Robbin,

On 18/09/2020 6:34 am, Robbin Ehn wrote:

We simplify the vmThread by removing the queue and refactor the the main loop.

Can you explain why it was necessary to remove the queue and exactly
what it has been replaced with? I'd like to understand the new
higher-level design for VMOperation execution rather than trying to
reverse engineer it from the code changes.

Thanks,
David
-----

robehn · 2020-09-21T07:27:01Z

Can you explain why it was necessary to remove the queue and exactly
what it has been replaced with? I'd like to understand the new
higher-level design for VMOperation execution rather than trying to
reverse engineer it from the code changes.

VM operations now rare and when we do them they are now also faster compared to when the queue was introduced.
(I believe way back the VM thread did all compiles in no-safepoint op, safepoint was on the higher prio vs lower prio non-safepoint ops)
During a normal execution we do handshakes and safepoints. The handshakes we default do used to be safepoint, there is no reason to threat them with a lower prio.
We reach the safepoint much faster nowadays, which means there is very little time to add anything to a queue.
And to reach safepoint faster it would be better to stop for the safepoint than adding anything to a queue before stopping.

So we replace the queue with a "next safepoint operation".
Any other safepoint requester will have their operation on their stack until they succeed to set it as "next safepoint operation".

Thanks,
David

coleenp

Looks like a nice cleanup. I had a couple of questions.

src/hotspot/share/runtime/vmThread.cpp

openjdk · 2020-09-22T21:03:17Z

@robehn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for more details.

After integration, the commit message for the final commit will be:

8212107: VMThread issues and cleanup

Reviewed-by: shade, dcubed, coleenp, dholmes, redestad

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 28 new commits pushed to the master branch:

d25b03e: 8253616: Change to GCC 10.2 for building on Linux at Oracle
821bd08: 8253667: ProblemList tools/jlink/JLinkReproducible{,3}Test.java on linux-aarch64
1ae6b53: 8252194: Add automated test for fix done in JDK-8218469
77a0f39: 8253540: InterpreterRuntime::monitorexit should be a JRT_LEAF function
0054c15: 8253435: Cgroup: 'stomping of _mount_path' crash if manually mounted cpusets exist
8e338f6: 8253646: ZGC: Avoid overhead of sorting ZStatIterableValues on bootstrap
ec9bee6: 8253015: Aarch64: Move linux code out from generic CPU feature detection
16b8c39: 8253053: Javadoc clean up in Authenticator and BasicAuthenicator
840aa2b: 8253424: Add support for running pre-submit testing using GitHub Actions
8e87d46: 8252857: AArch64: Shenandoah C1 CAS is not sequentially consistent
... and 18 more: https://git.openjdk.java.net/jdk/compare/1f5a033421bbcf803169c5c5f93314fd22b5a4a5...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

dholmes-ora

This generally looks good. Mapping between the old way and the new way is a little tricky but I think I made all the connections.
One thing I did notice is that it seems that nested VM operations are now restricted to a nesting depth of one - is that correct? (And the code could be a lot simpler if nesting was not needed. :) ).
A couple of minor comments/suggestions below.
Thanks.
David

src/hotspot/share/runtime/vmThread.cpp

src/hotspot/share/runtime/vmThread.hpp

robehn · 2020-09-23T08:37:28Z

This generally looks good. Mapping between the old way and the new way is a little tricky but I think I made all the connections.
One thing I did notice is that it seems that nested VM operations are now restricted to a nesting depth of one - is that correct? (And the code could be a lot simpler if nesting was not needed. :) ).
A couple of minor comments/suggestions below.
Thanks.
David

Hi, David.

The support should be the same as before the previous operation is stored on stack while we change to the nested operation.
I don't see what would be different from before?

Thanks, Robbin

dholmes-ora · 2020-09-23T10:53:47Z

Hi, David.

The support should be the same as before the previous operation is stored on stack while we change to the nested operation.
I don't see what would be different from before?

Sorry my mistake.

Thanks.

shipilev

I have only minor comments, without diving into the logic machinery. I am relying on others to review this more substantially.

src/hotspot/share/runtime/vmThread.cpp

src/hotspot/share/runtime/vmThread.hpp

src/hotspot/share/runtime/vmThread.cpp

shipilev

Looks good!

src/hotspot/share/runtime/vmThread.cpp

src/hotspot/share/runtime/vmThread.hpp

mlbridge · 2020-09-24T04:05:34Z

Mailing list message from David Holmes on hotspot-dev:

On 23/09/2020 9:27 pm, Robbin Ehn wrote:

On Wed, 23 Sep 2020 11:02:54 GMT, David Holmes <dholmes at openjdk.org> wrote:

inner_execute(..) is called in the non nested-case here:
https://github.com/openjdk/jdk/blob/e49178a4445378fa0b5505ad6e9f1661636f88b8/src/hotspot/share/runtime/vmThread.cpp#L474

Nested case:
https://github.com/openjdk/jdk/blob/e49178a4445378fa0b5505ad6e9f1661636f88b8/src/hotspot/share/runtime/vmThread.cpp#L511

Sorry I missed that. Seems odd that inner_execute handles nesting when that is only possible via one of the paths by
which it is called - that's why I thought it was only for the case where called from execute(). I'd rather see the
nesting logic handled as before, exclusively on the code path in which it can occur.

That would create a lot of code duplication:
void VMThread::none_nested_inner_execute(VM_Operation* op) {
Thread* current = Thread::current();
assert(current->is_VM_thread(), "must be a VM thread");

_cur_vm_operation = op;

HandleMark hm(VMThread::vm_thread());
EventMark em("Executing %s VM operation: %s", op->name());

// If we are at a safepoint we will evaluate all the operations that
// follow that also require a safepoint
log_debug(vmthread)("Evaluating %s %s VM operation: %s",
_cur_vm_operation->evaluate_at_safepoint() ? "safepoint" : "non-safepoint",
_cur_vm_operation->name());

bool end_safepoint = false;
if (_cur_vm_operation->evaluate_at_safepoint()) {
SafepointSynchronize::begin();
if (_timeout_task != NULL) {
_timeout_task->arm();
}
end_safepoint = true;
}

evaluate_operation(_cur_vm_operation);

if (end_safepoint) {
if (_timeout_task != NULL) {
_timeout_task->disarm();
}
SafepointSynchronize::end();
}

_cur_vm_operation = NULL;
}
Which 80% the same. (Same minus a few lines)

I envisaged simply moving the nesting check out of inner_execute and
back into execute:

// psuedo-code
execute(VM_Operation* op) {
if (on VMThread) {
if (_cur_operation != NULL) {
// nested case
check_nesting_allowed();
VM_Operation* prev = _cur_operation;
_cur_operation = NULL;
inner_execute(op);
_cur_operation = prev;
}
}

Cheers,
David

src/hotspot/share/runtime/vmThread.cpp

src/hotspot/share/runtime/vmThread.hpp

dcubed-ojdk · 2020-09-24T19:41:02Z

I'm looking at vmThread.cpp via the webrev and the "next" button
on the frames view has stopped working after change number 8:

https://openjdk.github.io/cr/?repo=jdk&pr=228&range=05#frames-6

The "Scroll Down" button is working so I'll push thru it...

dcubed-ojdk · 2020-09-24T19:50:58Z

src/hotspot/share/runtime/vmThread.cpp

                           (interval_ms >= GuaranteedSafepointInterval);
-  if (max_time_exceeded && SafepointSynchronize::is_cleanup_needed()) {
-    return &cleanup_op;
+  if (!max_time_exceeded) {


You've changed the meaning of SafepointALot here. If max_time_exceeded
is false, then you never check the SafepointALot flag and miss causing a
safepointALot_op to happen next.

Here's the old code:

394 if (max_time_exceeded && SafepointSynchronize::is_cleanup_needed()) {
395 return &cleanup_op;
396 }
397 if (SafepointALot) {
398 return &safepointALot_op;
399 }

In the old code if max_time_exceeded and we need a cleanup,
then cleanup_op is the priority, but if that wasn't the case, then
we always checked the SafepointALot flag.

The old behavior could create a SafepointALot when we had no 'safepoint priority' ops in queue when woken.
To get this behavior we need more logic to avoid back to back SafepointALot and we need to peek at _next_vm_operation to determine if it's a safepoint op or not (handshake).

During a normal test run the old behavior only creates around 1% more safepoints.
And if you want more safepoints you can decrease GuaranteedSafepointInterval (not exactly the same).

So I didn't think adding that complexity to exactly mimic the old behavior was worth it.

What you want me to do?

Hmmm.... The old SafepointALot was intended to safepoint as frequently
as possible to stress the system. Now we do very little at safepoints so
maybe it is time for SafepointALot to evolve. Can you make it so that a
SafepointALot happens some fraction of GuaranteedSafepointInterval, e.g.,
(GuaranteedSafepointInterval / 4) so four times as often?

All test using SafepointALot the already set the GuaranteedSafepointInterval to a low value in range of ~1-300ms.
(except for vm boolean flag test which uses SafepointALot to test a boolean flag)
For example jni/FastGetField sets GuaranteedSafepointInterval to 1.

The only case it would really differ is when adhoc adding SafepointALot without GuaranteedSafepointInterval.

If GuaranteedSafepointInterval is set to a lower value than the default on the command line, then I'm okay if SafepointALot does not do anything extra. However, if GuaranteedSafepointInterval is either the default value or is set to a higher value, then I would like SafepointALot to cause a safepoint more frequently than the GuaranteedSafepointInterval. Every GuaranteedSafepointInterval/4 would be a fine definition of "a lot".

Mulling on this more... is it too radical to consider that we no longer need SafepointALot?

I would like SafepointALot(and HandshakeALot) to be executed in a separate thread and that randomly request a safepoint (preferably with some validation inside the operation).
Since VM thread now handles this it can not do this request while busy.
Also having the VM thread 'more' sporadic waking up will be confusing for the VM thread loop.

So I agree with you that we need a better SafepointALot, but I think it wrong to use the VM thread to drive it.
I suggest we create an enhancement for it.

src/hotspot/share/runtime/vmThread.cpp

dcubed-ojdk · 2020-09-24T20:41:07Z

Most of my comments this round are not critical. The only real issue
that I found was the change in behavior for the SafepointALot flag.
The refactoring will make future code maintenance much, much easier,
but it made reviewing vmThread.cpp an adventure.

mlbridge · 2020-09-25T01:02:41Z

Mailing list message from David Holmes on hotspot-dev:

Hi Dan,

On 25/09/2020 6:39 am, Daniel D.Daugherty wrote:

On Thu, 24 Sep 2020 06:27:46 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

352: // Wait to install this operation as the next operation in the VM Thread
353: log_trace(vmthread)("A VM operation already set, waiting");
354: ml.wait();

So instead of a thread enqueuing an operation on the VMop queue
and then waiting for the operation to be executed, we have the thread
waiting to enqueue the operation as the "next operation". It seems to
me that the new algorithm means that the waiting thread will be
woken up more often and then go back to wait()ing without progress.
Perhaps this is mitigated by there being way fewer VM operations in
the system, but I'm not sure.

This is the whole premise of making this change: we no longer need a
queue because we rarely have >1 VM-operations in-flight. So the
expectation with the new "distributed queue" is that at most one or two
threads may be waiting.

Cheers,
David
-----

dcubed-ojdk · 2020-09-25T14:38:33Z

@dholmes-ora and @robehn - I'm good with the rationale about
why we have gotten rid of the VM op queue. My comment above
it mostly just mumbling about it to myself while I think it through...

cl4es · 2020-09-25T15:15:21Z

src/hotspot/share/runtime/mutexLocker.hpp

 extern Mutex*   RetData_lock;                    // a lock on installation of RetData inside method data
-extern Monitor* VMOperationQueue_lock;           // a lock on queue of vm_operations waiting to execute
+extern Monitor* VMOperation_lock;                // a lock on queue of vm_operations waiting to execute
 extern Monitor* VMOperationRequest_lock;         // a lock on Threads waiting for a vm_operation to terminate


Can the declaration of VMOperationRequest_lock be removed now too? Since it's no longer being defined in mutexLocker.cpp

Fixing, pushing later.

dholmes-ora

Still LGTM.

dcubed-ojdk

I'm okay with leaving SafepointALot as you have it now
and leaving any future cleanup/refinement to a new RFE.

robehn · 2020-09-29T09:36:26Z

Thanks all!

/integrate

openjdk · 2020-09-29T09:37:36Z

@robehn Since your change was applied there have been 37 commits pushed to the master branch:

6bddeb7: 8238761: Asynchronous handshakes
6d19fe6: 8253763: ParallelObjectIterator should have virtual destructor
55c90a1: 6514600: AbstractAction can throw NullPointerException when clone()d
b659132: 8252888: Collapse G1MMUTracker class hierarchy
e63b90c: 8251358: Clean up Access configuration after Shenandoah barrier change
9c17a35: 8253748: StressIGV tests fail with release VMs
70b0fcc: 8253728: tests fail with "assert(fr.is_compiled_frame()) failed: Wrong frame type"
527b0e4: 8248984: Bump minimum boot jdk to JDK 15
ac15d64: 8241151: Incorrect lint warning for no definition of serialVersionUID in a record
d25b03e: 8253616: Change to GCC 10.2 for building on Linux at Oracle
... and 27 more: https://git.openjdk.java.net/jdk/compare/1f5a033421bbcf803169c5c5f93314fd22b5a4a5...master

Your commit was automatically rebased without conflicts.

Pushed as commit 431338b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

robehn added 5 commits September 17, 2020 17:39

Basic simplifications, removed prio

702f7ac

Removed queue

2370d1f

Removed ticket and use only one Monitor

fd70bcf

Removed used linking in VM_Operation

b881eb4

Restructured and simplified

a4fcf61

openjdk bot added the hotspot [email protected] label Sep 17, 2020

robehn added 3 commits September 17, 2020 22:16

Added assert

87f67d8

Fixed ws

c185028

Fixed some indent misses

3866f06

robehn marked this pull request as ready for review September 17, 2020 20:30

openjdk bot added the rfr Pull request is ready for review label Sep 17, 2020

shipilev suggested changes Sep 18, 2020

View reviewed changes

robehn added 2 commits September 18, 2020 15:19

Fixes after review from shipilev.

fceebea

Merge branch 'master' into 8212107-vmthread

1ce63c1

dcubed-ojdk reviewed Sep 18, 2020

View reviewed changes

src/hotspot/share/runtime/vmThread.hpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

robehn added 2 commits September 21, 2020 08:58

Fixed nits

218eefc

Merge branch 'master' into 8212107-vmthread

e49178a

coleenp approved these changes Sep 22, 2020

View reviewed changes

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.cpp Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

openjdk bot added the ready Pull request is ready to be integrated label Sep 22, 2020

dholmes-ora approved these changes Sep 23, 2020

View reviewed changes

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.hpp Show resolved Hide resolved

Update after Coleen and David

f590c6c

robehn requested a review from shipilev September 23, 2020 11:27

shipilev approved these changes Sep 23, 2020

View reviewed changes

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.hpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.cpp Show resolved Hide resolved

Update after Shipilev

3c0395b

shipilev approved these changes Sep 23, 2020

View reviewed changes

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.hpp Outdated Show resolved Hide resolved

robehn requested a review from dcubed-ojdk September 23, 2020 15:27

Whitespace fixes, thanks to Shipilev

85cf900

dholmes-ora reviewed Sep 24, 2020

View reviewed changes

src/hotspot/share/runtime/vmThread.cpp Outdated Show resolved Hide resolved

src/hotspot/share/runtime/vmThread.hpp Show resolved Hide resolved

Removed assert as suggested by David

5bd7615

dcubed-ojdk suggested changes Sep 24, 2020

View reviewed changes

robehn added 2 commits September 25, 2020 13:26

Spelling fixes from Dan

03a033f

Merge branch 'master' into 8212107-vmthread

371ac28

cl4es reviewed Sep 25, 2020

View reviewed changes

Removed unused declartion - Claes

d64f648

dholmes-ora approved these changes Sep 28, 2020

View reviewed changes

cl4es approved these changes Sep 28, 2020

View reviewed changes

dcubed-ojdk approved these changes Sep 28, 2020

View reviewed changes

openjdk bot closed this Sep 29, 2020

openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 29, 2020

robehn deleted the 8212107-vmthread branch September 29, 2020 09:46

8212107: VMThread issues and cleanup #228

8212107: VMThread issues and cleanup #228

Uh oh!

Conversation

robehn commented Sep 17, 2020 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Download

Uh oh!

bridgekeeper bot commented Sep 17, 2020

Uh oh!

openjdk bot commented Sep 17, 2020

Uh oh!

mlbridge bot commented Sep 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

shipilev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcubed-ojdk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mlbridge bot commented Sep 21, 2020

Uh oh!

robehn commented Sep 21, 2020

Thanks, David

Uh oh!

coleenp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openjdk bot commented Sep 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

robehn commented Sep 23, 2020

Uh oh!

dholmes-ora commented Sep 23, 2020

Uh oh!

shipilev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shipilev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mlbridge bot commented Sep 24, 2020

Uh oh!

Uh oh!

Uh oh!

dcubed-ojdk commented Sep 24, 2020

Uh oh!

robehn commented Sep 17, 2020 •

edited by openjdk bot

Loading

mlbridge bot commented Sep 17, 2020 •

edited

Loading

Thanks,
David

openjdk bot commented Sep 22, 2020 •

edited

Loading