Skip to content

Conversation

@galderz
Copy link
Contributor

@galderz galderz commented Jun 5, 2023

Implements JFR jdk.OldObjectSample event for #5145, without path to GC root functionality. I have a prototype for the path to GC root functionality, but I’ve decided to split this work into a separate PR to make reviewing more manageable.

This PR includes:

  • Event emitted for sampled objects and arrays that are possible leaks.
  • They are sampled when they’re allocated via the slow path. The slow path happens when an object does not fit in the TLAB and a new one is required. Arrays bigger than the large array threshold are also allocated via the slow path regardless.
  • Following the same logic found in HotSpot, the sampler is implemented as a custom priority queue ordered by span, which is based on allocation size. Objects with the smallest span are at the top of the queue.
  • The sample queue is initialized with a fixed-size array at runtime. The default size is 256 elements, but this can be configured at runtime using -XX:FlightRecorderOptions=old-object-queue-size=<size> option.
  • If the queue is full, and the sampled object is bigger than the top, the top is removed and the sampled object is inserted.
  • If the queue is full and the sampled object is smaller than the top, the code tries to scavenge the queue which means purging objects that have been garbage collected.
  • To scavenge, the underlying array backing the queue is iterated over, removing those sampled objects that have been garbage collected.
  • To check if an object has been garbage collected or not, sampled objects are wrapped around WeakReferences.
  • Operations on the sampler queue are protected by a new non-reentrant SpinLock. This new lock implementation works similarly to a VMMutex but extends it to offer a tryLock() function. This function is useful for sampling because if another object is already being sampled and the lock cannot be acquired, the sampling is simply skipped. When emitting old object sample events, the sampler waits to acquire the lock.
  • Old object sample events emitted with custom descriptions for Thread, ThreadGroup and Class. Ellipsis descriptions are also supported.

Caveats:

  • Calling WeakReference.get() from uninterruptible code fails. Adding @AnnotateOriginal @Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true) to Target_java_lang_ref_Reference.get() does not seem to have any impact. Looking at AnnotationSubstitutionProcessor, classes that have @Substitute don’t seem to get processed for methods with @AnnotateOriginal. Worked around by adding calleeMustBe = false to the Uninterruptible method within which WeakReference.get() is called. This caveat has been addressed, see latest commits.
  • Old object queue size can only be configured at runtime via the startup recorder options flag. I couldn’t find a way to unit test this since all unit tests run as a single binary run, so the queue size is fixed for all the tests.
  • Old object sample event descriptions for classes that have a size field not implemented since they’re not feasible in a closed world environment. In HotSpot this is achieved by using reflection to lookup a size field but that cannot be done in SubstrateVM.
  • Old object sample event description tests only added for thread group. Threads not tested because creating many instances of those triggers other intermediary objects to be sampled so couldn’t easily get to the Thread instances. Classes not tested since it requires dynamically creating brand new Class instances at runtime, something not feasible in closed-world SubstrateVM.

Once this PR is integrated I'll work on adding path to GC root functionality.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 5, 2023
@galderz
Copy link
Contributor Author

galderz commented Jun 5, 2023

Can @roberttoyonaga @christianhaeubl and @adinn review this?

@galderz galderz changed the title Add JFR old object event Add JFR old object sample event Jun 5, 2023
@galderz
Copy link
Contributor Author

galderz commented Jun 5, 2023

Fixing some eclipse format issues...

@galderz
Copy link
Contributor Author

galderz commented Jun 6, 2023

Added a few commits to address warning errors. Also made the full sample queue test more reliable.

@galderz
Copy link
Contributor Author

galderz commented Jun 6, 2023

I think I've fixed all style and format errors. However, gate testing is not working. It fails with:

Tag "test" not part of any task.
Run the following command to see all available tasks and their tags:
  mx -v gate --dry-run
Traceback (most recent call last):
  File "/home/runner/work/graal/graal/mx/mx_gate.py", line 475, in gate
    mx.abort(f'Tag "{tag}" not part of any task.\n'
  File "/home/runner/work/graal/graal/mx/mx.py", line 4273, in abort
    raise SystemExit(error_code)
SystemExit: 1

@christianhaeubl Any ideas? @zakkak said it could be related to #6738.

@zakkak
Copy link
Collaborator

zakkak commented Jun 6, 2023

@zakkak said it could be related to #6738.

FTR I have created #6746 to resolve the issue.

Copy link
Collaborator

@roberttoyonaga roberttoyonaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Galder! It would be really awesome to add this support for JFR leak profiling. I've left a few comments/questions here.

public void lock() {
@SuppressWarnings("deprecation")
final long threadId = Thread.currentThread().getId();
while (!lock.compareAndSet(NOT_HELD, threadId)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need a PauseNode.pause(); here. Would it be helpful to use JavaSpinLockUtils instead of this custom spinlock(It might require a bit of re-working though)? It already has tryLock methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I had completely missed that class. I've just tried it and does the job.

int numDead = 0;
for (int i = 0; i < samples.getCapacity(); i++) {
final OldObject sample = samples.getSample(i);
if (sample.reference != null && sample.reference.get() == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method the current thread holds the spinlock (which can lock without transition). sample.reference.get() steps into interruptible code and may result in the current thread waiting for a safepoint while holding the spinlock. If another thread calls JfrOldObjectSampler.emit(long , boolean) at this time, it will deadlock because it is blocked waiting for the spinlock, but cannot allow safepointing since it's executing uninterruptible code.

Your comment in the PR description says it's not possible to make WeakReference.get() uninterruptible using @AnnotateOriginal. @christianhaeubl do you know if there's a way to work around this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found a solution. I've hacked AnnotationSubstitutionProcessor.handleAnnotatedMethodInSubstitutionClass to allow for @KeepOriginal and AnnotateOriginal to be used together, and then made Reference.get like this:

    @KeepOriginal
    @AnnotateOriginal
    @Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true)
    native T get();

Then it all works :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think that should solve the problem. Were only superficial changes to AnnotationSubstitutionProcessor.handleAnnotatedMethodInSubstitutionClass needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, small changes indeed :)

I've not pushed all the commits yet because I was trying to get clarification about #6743 (comment), but that might take some days, so I'll probably push updates shortly.

* changing epoch.
*/
this.flushCheckpointRepos = new JfrRepository[]{stackTraceRepo, methodRepo, typeRepo, symbolRepo};
this.flushCheckpointRepos = new JfrRepository[]{stackTraceRepo, methodRepo, typeRepo, symbolRepo, oldObjectRepo};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the oldObjectRepo should come earlier in the write order. It will reference type and symbol data. Otherwise there could be races (ie. if entries are added to oldObjectRepo after symbolRepo is written to disk but before oldObjectRepo is written to disk, then there could be dangling references in the chunk)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I've reordered repos.

public void triggerUpdateOfLastKnownHeapUsage() {
// Trigger GC before tests to get a reading of last known heap usage for the first executed
// test.
System.gc();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to invoke gc twice here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, once is enough :)

return startRecording(events, getDefaultConfiguration(), settings);
}

static void blackhole(Object obj) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The leak field is assigned to a static variable but it's not used, so at the beginning when I started working on this I was worried the compiler might optimize this away and remove all the code. The blackhole method is just a way to make sure the field is used in a trivial way, but one that cannot be optimized away because it can (extremely unlikely) end up causing a side effect (printing an empty space character).

I've just removed it and calls to it and it all works fine, so I'll go ahead and remove it.

Object[] right = new Object[3];
node[2] = right;
node = right;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this adding objects to this array try to make the 100 element array allocation get sampled along the allocation slow path? If so, is there a chance that it doesn't actually get sampled?

What if you directly called JfrOldObjectSampleEvents.sample(result, size.rawValue(), Integer.MIN_VALUE); from this testing code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this adding objects to this array try to make the 100 element array allocation get sampled along the allocation slow path? If so, is there a chance that it doesn't actually get sampled?

All I'm doing is generating enough arrays so that some of them get sampled as old objects and I can verify that sampled arrays make it through as events and they have the right data populated. Having the right data here means that the sampled event is for Object[] has length of 100. If no arrays were sampled the test would fail because it guarantees that the list of events generated for the Object[] is not empty.

What if you directly called JfrOldObjectSampleEvents.sample(result, size.rawValue(), Integer.MIN_VALUE); from this testing code?

Hmmm, not sure what that would do that this test is not already covering?

Copy link
Collaborator

@roberttoyonaga roberttoyonaga Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see. I was just concerned that there isn't a guarantee the length 100 array gets sampled (although it's probably likely because we try 1M times). Although unlikely, what if the the length 3 array accidentally gets sampled because we happen to run out of space in the current TLAB, would the test fail because it's only expecting the length 100 array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will happen because the allocated size of an array of 100 elements is bigger than that of 3 elements, so you should see the priority queue logic kicking in, by pushing out arrays of 3 elements out and sampling those that have 100 elements.

final long heapUsedAtLastGC = sample.heapUsedAtLastGC;
final int arrayLength = sample.arrayLength;
OldObjectSampleEvent.emit(timestamp, objectId, allocationTime, threadId, stackTraceId, heapUsedAtLastGC, arrayLength);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flushing thread can write serialized data to the current chunk on disk while this method is executing. If the event data gets flushed without the constant pool data in the OldObjectRepository, then there will be missing reference data. I think this scenario is possible because flushes are ignored by the OldObjectRepository. I think fixing this might be as simple as allowing flushes in the OldObjectRepository similar to the other repos. Most of the time flushes will be ignored anyway because OldObjectRepository will only have serialized data once a recording is actually stopped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I've changed it to allow flushes.

final OldObject prev = samples.getSample(prevIndex);
if (prev.reference != null) {
queue.remove(prev);
prev.span += sample.span;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding span here to the previous? Is this an optimization?

* obtained. This lock is non-reentrant so if the lock is already held by the current thread,
* the method throws a {@link RuntimeException}.
*/
@Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this shouldn't be allowed to be in-lined since we are effectively locking without transition here (if it gets in-lined into interruptible code it also becomes interruptible). You might also need callermustbe = true ie @Uninterruptible(reason = "The whole critical section must be uninterruptible.", callerMustBe = true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class will be removed and instead use JavaSpinLockUtils.

@galderz
Copy link
Contributor Author

galderz commented Jun 26, 2023

Pushed a few updates that should address all of @roberttoyonaga's feedback:

  • Use JavaSpinLockUtils instead of rolling your own.
  • Make calls to WeakReference.get() uninterruptible. To achieve this, since Reference class is substituted,
    annotation processor had to be changed to enable @KeepOriginal and @AnnotateOriginal to be used together
    in classes that are marked with @Substitute.
  • Just called System.gc() call once to record a first heap usage.
  • Remove blackhole methods which are not necessary.
  • Put old object repo before type and symbol repo.
  • Support flush of old object repository.
  • Link samples with a addition-time-based singly linked list, in order to keep same logic as in HotSpot when scavenging. When removing entries through scavenging, their priority is passed on to an sample allocated just after it. This is done to keep samples distributed evenly.
  • Migrate sampler to profiler and separate sampler and emitter. By doing that, the profiler can be more easily unit tested, while still encapsulating the sampler, emitter, queue and list implementations.
  • Added a unit test for the old object profiler.
  • Just like in HotSpot, add total allocated tracking logic.
  • Made clear which of the tests exercise JFR recording vs the other unit tests.

@galderz
Copy link
Contributor Author

galderz commented Jun 27, 2023

Formatting, style...etc should be done now, but CI shows all the recording tests I've added failing when tested with labsjdk-ce-21-jvmci-23.1-b09. The problem arises because object field of the recorded event seems to be null. E.g.

jdk.OldObjectSample {
  startTime = 13:40:06.910 (2023-06-27)
  allocationTime = 13:40:06.910 (2023-06-27)
  objectSize = 33.8 kB
  objectAge = 19.2 ms
  lastKnownHeapUsage = 216 bytes
  object = N/A
...

I'm looking into this.

@galderz
Copy link
Contributor Author

galderz commented Jun 27, 2023

Hmmmm, seems like in JDK 21 (or some other version post 17) the event has an extra field objectSize:

            E: field
             A: name allocationTime
             A: class 219
                E: annotation
                 A: class 18707
                E: annotation
                 A: class 18699
                 A: value TICKS
                E: annotation
                 A: class 7632
                 A: value Allocation Time
            E: field
             A: name objectSize
             A: class 219
                E: annotation
                 A: class 18707
                E: annotation
                 A: class 18720
                 A: value BYTES
                E: annotation
                 A: class 7632
                 A: value Object Size
            E: field
             A: name objectAge
             A: class 219
                E: annotation
                 A: class 18707
                E: annotation
                 A: class 18693
                 A: value TICKS
                E: annotation
                 A: class 7632
                 A: value Object Age

Compared with JDK17:

            E: field
             A: name allocationTime
             A: class 205
                E: annotation
                 A: class 18875
                E: annotation
                 A: class 18866
                 A: value TICKS
                E: annotation
                 A: class 6859
                 A: value Allocation Time
            E: field
             A: name objectAge
             A: class 205
                E: annotation
                 A: class 18875
                E: annotation
                 A: class 18860
                 A: value TICKS
                E: annotation
                 A: class 6859
                 A: value Object Age

Observed parsing the JFR file with @jiekang's excellent jfr-reader tool.

I'm unsure how we should handle this. I'll see if I can handle it using the JDK earlier/later boolean suppliers.

@galderz
Copy link
Contributor Author

galderz commented Jun 27, 2023

FYI objectSize field was added in JDK18.

@galderz
Copy link
Contributor Author

galderz commented Jun 28, 2023

Fixed tests and pushed commits to support JDK18+. I've ended up removing object description tests because in later JDKs generating ThreadGroup instances unfortunately creates weak reference arrays that hide ThreadGroup instance samples (e.g. see here). This changes have been introduced via the virtual thread support. I had similar issues with other types created while creating Thread instances. As well as being pretty heavyweight objects, creating these classes generate a lot of other garbage which dominates samples. I will defer testing of object descriptions, via unit tests, to the path to GC root work, because that involves considerable changes on the emit side, so I can use that opportunity to revisit the code more thoroughly and find an easy way to validate these descriptions.

Copy link
Collaborator

@roberttoyonaga roberttoyonaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @galderz! I think you've addressed all my earlier comments. I left a few more pretty minor notes.

profiler.sample(new WeakReference<>(String.valueOf(i)), i * 100, -1);
}

profiler.emit(0, Long.MAX_VALUE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure OldObjectSampler.scavenge() gets called here. I think you need to sample once more after the queue is full in order to induce a scavenge operation. I think this test still passes because OldObjectEventEmitter.emitUnchained(long) checks if the each sample is alive before emitting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, I think I know what you mean, but let's refresh how scavenging has been ported to see if this makes sense:

Technically the way scavenge works in HotSpot is like this:

void ObjectSampler::add(HeapWord* obj, size_t allocated, traceid thread_id, bool virtual_thread, const JfrBlobHandle& bh, JavaThread* thread) {
  ...

  if (Atomic::load(&_dead_samples)) {
    // There's a small race where a GC scan might reset this to true, potentially
    // causing a back-to-back scavenge.
    Atomic::store(&_dead_samples, false);
    scavenge();
  }

It only gets called when sampled, but it calls it whenever a sample has been GC'd. There's no such notification mechanism in SVM, so instead I went for the option of scavenging when the queue is full.

Now, the issue I think you're implying above is that if the queue is not full, no scavenging happens and maybe it's good idea to scavenge after emitting to see if anything is dead and can be removed. This sounds like a good idea to me. It seems better than an alternative option of scavenging in each object sample add call. Have I got this right?

Copy link
Collaborator

@roberttoyonaga roberttoyonaga Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe it's good idea to scavenge after emitting to see if anything is dead and can be removed.

You may not even need to run a full scavenge() operation. When emitting in OldObjectEventEmitter.emitUnchained(long) you already check if each sample is alive. If it is dead, you can directly remove it with OldObjectSampler.remove(OldObject). Admittedly, this is only really useful if you have multiple simultaneous recordings running since we only emit the OldObjectSample event once per recording anyway (so usually there will be no point to doing the extra cleanup work during event emission).

Have I got this right?

hmm my original comment was actually concerning the test itself. I think that your current implementation of scavenging/emitting works perfectly fine. I think testScavengeMiddle, testScavengeYoungest, testScavengeOldest and testScavengeAll don't actually exercise the code in OldObjectSampler.scavenge() and instead rely on OldObjectEventEmitter.emitUnchained(long) to remove the dead samples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may not even need to run a full scavenge() operation. When emitting in OldObjectEventEmitter.emitUnchained(long) you already check if each sample is alive. If it is dead, you can directly remove it with OldObjectSampler.remove(OldObject). Admittedly, this is only really useful if you have multiple simultaneous recordings running since we only emit the OldObjectSample event once per recording anyway (so usually there will be no point to doing the extra cleanup work during event emission).

Ok, I think I'll leave the logic as is for now.

hmm my original comment was actually concerning the test itself. I think that your current implementation of scavenging/emitting works perfectly fine. I think testScavengeMiddle, testScavengeYoungest, testScavengeOldest and testScavengeAll don't actually exercise the code in OldObjectSampler.scavenge() and instead rely on OldObjectEventEmitter.emitUnchained(long) to remove the dead samples.

That's right. I'll rework this to make sure the scavenging gets tested.

// the weight of a sample that is removed should be redistributed.
// Here it gets redistributed to the sample that came just after in time.
queue.remove(prev);
prev.span += sample.span;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as how it's done in Hotspot. However, do you know why the span is added to the younger neighbor rather than the older neighbor? To me, it makes more sense to add additional weight to the older neighbor since the goal is to keep track of long-lived objects. I'm not saying the behavior should be changed, I'm just bringing it up for potential discussion because I'm curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another interesting observation which also needs some further explanation :)

Because the span is added it to the younger neighbour, that is why I implemented it as a single-linked list. In HotSpot, this is implemented as double-linked list so it could efficiently distribute the span to the older sample. I went for the simpler, more space efficient, option, taking into account that the default sample size is relatively small, so when removing things I can just loop around and fix things up.

Now, about how to distribute the span. Here's what Erik said recently when I asked him about the linked list of samples:

The weight of the object, the allocation span, in the time-based list, should be divided to its neighbour(s).

Then @tstuefe pointed out:

span is redistributed to its (in time) predecessor sample. Following your explanation, should the span not be evenly divided between predecessor and successor?

The first part is not entirely correct since prev is really the sample that was sampled after the current object, so indeed the younger one. But the point is that Erik suggest it should be divided. So, 50% to younger and 50% to older. Unfortunately Erik didn't reply to this, so I think it would make sense to follow up on hotspot-jfr-dev and see if HotSpot should change accordingly. Feel free to kick off a discussion there since I'll be away for a bit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed response! Ok that all makes sense to me. I agree that a singly-linked is the best option to achieve the same behavior as in hotspot (only redistribute to the younger neighbor).

That's a good idea. I'll follow up on your "hotspot-jfr-dev" email thread asking "why doesn't the Hotspot implementation actually distribute the span evenly to both neighbors? And if it must distribute unevenly, why not choose the older neighbor?".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the final outcome of this discussion?

What if all sampled objects are dead at some point? Then, we don't redistribute the span and queue.total() will in fall to 0. This will result in a large difference between totalAllocated and queue.total(), so new samples can easily evict old samples for a long period of time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked on the hotspot mailing list here: https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2023-June/005164.html

Unfortunately, there was no response.

}

@AnnotateOriginal
@Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true)
Copy link
Collaborator

@roberttoyonaga roberttoyonaga Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you need it for Reference, but why do you need to add the uninterruptible annotation to this class? I have the same question about SoftReference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don't know the exact details on the limitations around @Uninterruptible, but unless you do this native image fails, e.g.:

com.oracle.svm.core.util.VMError$HostedError: Found 1 violations of @Uninterruptible usage:
- method java.lang.ref.Reference.get():Object is annotated but java.lang.ref.PhantomReference.get():Object is not

I suspect that if classes override a method that is @Uninterruptible, e.g. PhantomReference, then they also need to mark it @Uninterruptible.

@galderz
Copy link
Contributor Author

galderz commented Jul 13, 2023

I've added a few more commits to address the latest feedback:

  • Renamed the old scavenge unit test names to better represent what they test, e.g. testEmitSkippingYoungest.
  • Modified testSampleAfterEmit to use a custom mutable boolean object to control when to phase objects from being alive to being dead. It makes the test easier to understand rather than relying on specific time tic values.
  • Added a scavenge specific unit test that verifies that scavenging works as expected. It uses the mutable boolean object to control objects being dead when it matters for scavenging to kick in.
  • Re-added thread id retrieval behind effects in order to easily run the test from the IDE.

Copy link
Collaborator

@roberttoyonaga roberttoyonaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new changes. It looks good to me.

@galderz
Copy link
Contributor Author

galderz commented Jul 14, 2023

The espresso and sulong failures seem unrelated.

@galderz
Copy link
Contributor Author

galderz commented Jul 14, 2023

@christianhaeubl Any thoughts on this PR?

@galderz galderz force-pushed the topic.0605.old-object-sample-event.mk4 branch from 11344c0 to 57a0051 Compare August 7, 2023 09:10
@galderz
Copy link
Contributor Author

galderz commented Aug 7, 2023

FYI I've just updated the PR to fix the conflicts.

@galderz galderz force-pushed the topic.0605.old-object-sample-event.mk4 branch from 57a0051 to 48df685 Compare August 28, 2023 07:52
@galderz
Copy link
Contributor Author

galderz commented Aug 28, 2023

Updated the code to fix conflicts. As a result of changes in master, I moved the code to using ReferenceInternals.getReferent to get the value out of the weak reference. The method I had used before to add uninterruptible to Reference.get methods didn't work any more.

By the way, let me know when you're happy with the code and I can squash all these commits. This would greatly simplify potential backports down the line.

@galderz galderz force-pushed the topic.0605.old-object-sample-event.mk4 branch from 302e66f to c903091 Compare September 11, 2023 04:49
Copy link
Member

@christianhaeubl christianhaeubl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I reviewed most of it (except for test cases) and added questions and comments.


@Override
protected final Object callNewInstanceStub(Word hub) {
protected final Object callNewInstanceStub(Word hub, UnsignedWord sizeSp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sizeSp instead of size?

private void collectOperation(CollectionVMOperationData data) {
assert VMOperation.isGCInProgress();
assert getCollectionEpoch().equal(data.getRequestingEpoch());
final HeapAccounting heapAccounting = HeapImpl.getAccounting();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to avoid the final keyword for local variables because it primarily adds verbosity without much benefit.

Heap.getHeap().updateUsedAtGC();
JfrGCHeapSummaryEvent.emit(JfrGCWhen.AFTER_GC);

UnsignedWord sizeAfter = heapAccounting.getEdenUsedBytes().add(heapAccounting.getSurvivorUsedBytes()).add(heapAccounting.getOldUsedBytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be simplified to heapAccounting.getUsedBytes()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. Initially I didn't spot that survivor bytes are counted youngUsedBytes, but I see now that after in after collection eden and survivor are added together and they update youngUsedBytes.

GenScavengeMemoryPoolMXBeans.notifyAfterCollection();

printGCAfter(cause);
Heap.getHeap().updateUsedAtGC();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this store the same value as heapAccounting.getUsedBytes()? If so, then we should probably add a new field to HeapAccounting (e.g., usedAfterLastGC) that is updated in HeapAccounting.notifyAfterCollection().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this approach is in HeapAccounting, which is not accessible outside its module. I need to get the heap used after last GC from the svm.core module. This is why I added the methods directly to Heap. Any ideas around this? I doubt you want to open up HeapAccounting to access externally? getUsedAtLastGC, or a similarly named method, could remain in Heap and then define updateUsedAtGC in HeapImpl.


import java.lang.ref.WeakReference;

public class JfrOldObjectSampleEvents {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class name confused me a bit. I don't think that this class should have Events in its name because it is only used for the sampling (and not for emitting the event).

allocatedSize should be of type UnsignedWord (if a conversion from UnsignedWord to long is necessary, we try to do it as late as possible).

Copy link
Contributor Author

@galderz galderz Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class name confused me a bit. I don't think that this class should have Events in its name because it is only used for the sampling (and not for emitting the event).

Makes sense. I can rename it to JfrOldObjectSampler.

allocatedSize should be of type UnsignedWord (if a conversion from UnsignedWord to long is necessary, we try to do it as late as possible).

Hmmmm, this clashes a bit with my attempts to test the profiler logic in a way that it can run in JVM mode, see #6743 (comment). Is it possible to run code that uses UnsignedWord in JVM mode? By doing an early switch to long I am then able to keep the OldObjectProfiler logic runnable in both JVM and native modes.

/*
* Register proxies for event data assertion
*/
RuntimeProxyCreation.register(Unsigned.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some more information to the comment why this is needed.

register(methodSubstitutions, annotated, original, substitution);
} else if (keepOriginalAnnotation != null) {
register(methodSubstitutions, annotated, original, original);
AnnotateOriginal annotateOriginalAnnotation = lookupAnnotation(annotatedMethod, AnnotateOriginal.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed not needed.

if (queue.isFull()) {
if (queue.peek().span > span) {
// Sample will not fit, try to scavenge
// HotSpot code scavenges when it gets notified that a sample was GC'd,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, you could use a reference queue to get notified but I think that is too tricky to do from uninterruptible code. How does the notification work in HotSpot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That notification is done via the oop storage that we discussed. We can implement it once #7503 is in place.


// Moving locking to OldObjectProfiler class cannot easily be done.
// For starters Unsafe needs to be accessible there,
// but there might be other issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multi-line comments, we are moving more and more towards /* */-style comments to avoid formatting issues.

final class OldObjectSampler {
private final OldObjectArray samples;
private final OldObjectPriorityQueue queue;
private final OldObjectList list;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some class-level JavaDoc why the 3 different views on the data are necessary (array, list, priority queue).

@galderz galderz force-pushed the topic.0605.old-object-sample-event.mk4 branch from c903091 to 18a34e7 Compare September 25, 2023 04:08
@galderz
Copy link
Contributor Author

galderz commented Oct 9, 2023

I pushed commits too early, I still have some comments to handle. I'll post an update when they're ready for another review.

@galderz
Copy link
Contributor Author

galderz commented Oct 16, 2023

@christianhaeubl I've addressed all comments. Can you have another look? As said above, let me know when you're happy before integrating it so that I can squash all commits. Doing this makes it easier for me to backport it to another branch if necessary.

@galderz galderz force-pushed the topic.0605.old-object-sample-event.mk4 branch 2 times, most recently from 7b4763a to fb6b4ce Compare October 30, 2023 11:56
@galderz
Copy link
Contributor Author

galderz commented Oct 30, 2023

I've fixed the conflicts due to graal compiler package refactoring and I used the opportunity to squash the commits.

@galderz galderz force-pushed the topic.0605.old-object-sample-event.mk4 branch from 8c20fe2 to 95c4e3e Compare November 1, 2023 08:49
@galderz
Copy link
Contributor Author

galderz commented Nov 1, 2023

Fixed another conflict and pushed.

* Sample objects and arrays that are possible leaks.
* Sampling done in allocation slow path.
* It uses a custom priority queue ordered by span.
* The sampler uses a fixed-sized array at runtime
to keep the samples in memory.
By default this is 256 elements but it's configurable
via `-XX:FlightRecorderOptions`.
* Samples are scavenged when the queue is full.
* Checking if objects are garbage collected is done
using WeakReferences.
* Added JFR repository for serializing and emitting
old object information.
* Added plain object and array tests.
* Added tests that the sample queue to be full.
* Added old object description tests.
@galderz galderz force-pushed the topic.0605.old-object-sample-event.mk4 branch from 95c4e3e to d8a5916 Compare November 17, 2023 08:53
Copy link
Member

@christianhaeubl christianhaeubl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galderz : I am currently cleaning up this PR so that we can integrated into master. As part of that, I came across a few code parts that were unclear, so I added a few questions.

OldObject current = list.head();
while (current != null) {
if (current.reference != null) {
final Object obj = effects.getWeakReferent(current.reference);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  • After emitting the data, do we need to clear the data structures?
  • Currently, we emit the data in the order from the oldest to the youngest allocation. Is it necessary to emit the data in this order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  • After emitting the data, do we need to clear the data structures?

Which data structure are you talking about exactly? The sample queue/list has to remain in place, the elements in it only get cleared when they're GC'd. So, they keep being emitted until then.

  • Currently, we emit the data in the order from the oldest to the youngest allocation. Is it necessary to emit the data in this order?

I've been wondering about this and I explicitly asked about this in the jfr-dev list (see discussion).

Specifically I asked:

I'm wondering now if the fact that the list view is ordered by insertion time is relevant here. Is it maybe trying to give more importance (by increasing its span) to surviving objects that are near in time to those that have been garbage collected.

Erik replied:

The intent of the code is to remove the sample from both the time-based list and the priority queue as the object is no longer alive.

So the time aspect of the list order is acknowledged.

Now, when it comes to emitting the events, the HotSpot code uses the list order to write the events. See here. object_sampler->first() calls into _list->first(). I've tried to replicate the same logic in svm.

@Uninterruptible(reason = "Access protected by lock.")
public boolean sample(WeakReference<Object> ref, long allocatedSize, int arrayLength) {
totalAllocated += allocatedSize;
long span = totalAllocated - queue.total();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this part: here, totalAllocated is increased no matter if the sample actually fits into the queue. So, totalAllocated will increase faster than OldObjectPriorityQueue.total. This means that span will increase over time, which makes it easier and easier for new samples to evict old samples.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also what Thomas Stuefe was concerned about. See mailing list message: https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2023-June/005069.html

As far as I can tell, this question wasn't given a clear answer.

// the weight of a sample that is removed should be redistributed.
// Here it gets redistributed to the sample that came just after in time.
queue.remove(prev);
prev.span += sample.span;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the final outcome of this discussion?

What if all sampled objects are dead at some point? Then, we don't redistribute the span and queue.total() will in fall to 0. This will result in a large difference between totalAllocated and queue.total(), so new samples can easily evict old samples for a long period of time.

@galderz
Copy link
Contributor Author

galderz commented Jan 24, 2024

What was the final outcome of this discussion?
What if all sampled objects are dead at some point? Then, we don't redistribute the span and queue.total() will in fall to 0. This will result in a large difference between totalAllocated and queue.total(), so new samples can easily evict old samples for a long period of time.

There has not been further replies to this discussion. I've replied in the thread to see if Erik can provide an answer.

In any case, if it's really a bug, then we wait for the hotspot fix and then backport it here. That way the two impls are relatively close in their logic.

@galderz
Copy link
Contributor Author

galderz commented Jan 24, 2024

@christianhaeubl I see there's a conflict in HeapImpl. Shall I address it or are you taking care of it?

@christianhaeubl
Copy link
Member

@galderz: I will handle the conflict.

}
// Trigger a GC so that last sweep gets updated,
// and the objects above are considered older than last GC sweep time.
System.gc();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One, probably last question: what is the intention of this line? I don't think that your code accesses the last GC sweep time anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think this is leftover from previous versions of the code when I was tracking the last GC sweep time, but that is only relevant when path to GC root is on, and you set a cut off time for the samples. I'll remove it and make sure everything works as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed not necessary. I've pushed a commit to remove System.gc() calls related to last sweep time.

@christianhaeubl
Copy link
Member

@galderz , @roberttoyonaga : for integration into master, I opened #8251 which adds a larger number of cleanups and fixes smaller bugs. Feel free to review this PR.

@galderz
Copy link
Contributor Author

galderz commented Feb 21, 2024

@christianhaeubl Now that #8251 is integrated, shall we close this?

@christianhaeubl
Copy link
Member

Yes, good point.

@fniephaus fniephaus changed the title Add JFR old object sample event [GR-48683] Add JFR old object sample event May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature native-image OCA Verified All contributors have signed the Oracle Contributor Agreement. redhat-interest

Projects

Status: Released

Development

Successfully merging this pull request may close these issues.

5 participants