-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[GR-48683] Add JFR old object sample event #6743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GR-48683] Add JFR old object sample event #6743
Conversation
|
Can @roberttoyonaga @christianhaeubl and @adinn review this? |
|
Fixing some eclipse format issues... |
|
Added a few commits to address warning errors. Also made the full sample queue test more reliable. |
|
I think I've fixed all style and format errors. However, gate testing is not working. It fails with: @christianhaeubl Any ideas? @zakkak said it could be related to #6738. |
roberttoyonaga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Galder! It would be really awesome to add this support for JFR leak profiling. I've left a few comments/questions here.
| public void lock() { | ||
| @SuppressWarnings("deprecation") | ||
| final long threadId = Thread.currentThread().getId(); | ||
| while (!lock.compareAndSet(NOT_HELD, threadId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need a PauseNode.pause(); here. Would it be helpful to use JavaSpinLockUtils instead of this custom spinlock(It might require a bit of re-working though)? It already has tryLock methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I had completely missed that class. I've just tried it and does the job.
| int numDead = 0; | ||
| for (int i = 0; i < samples.getCapacity(); i++) { | ||
| final OldObject sample = samples.getSample(i); | ||
| if (sample.reference != null && sample.reference.get() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this method the current thread holds the spinlock (which can lock without transition). sample.reference.get() steps into interruptible code and may result in the current thread waiting for a safepoint while holding the spinlock. If another thread calls JfrOldObjectSampler.emit(long , boolean) at this time, it will deadlock because it is blocked waiting for the spinlock, but cannot allow safepointing since it's executing uninterruptible code.
Your comment in the PR description says it's not possible to make WeakReference.get() uninterruptible using @AnnotateOriginal. @christianhaeubl do you know if there's a way to work around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found a solution. I've hacked AnnotationSubstitutionProcessor.handleAnnotatedMethodInSubstitutionClass to allow for @KeepOriginal and AnnotateOriginal to be used together, and then made Reference.get like this:
@KeepOriginal
@AnnotateOriginal
@Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true)
native T get();
Then it all works :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think that should solve the problem. Were only superficial changes to AnnotationSubstitutionProcessor.handleAnnotatedMethodInSubstitutionClass needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, small changes indeed :)
I've not pushed all the commits yet because I was trying to get clarification about #6743 (comment), but that might take some days, so I'll probably push updates shortly.
| * changing epoch. | ||
| */ | ||
| this.flushCheckpointRepos = new JfrRepository[]{stackTraceRepo, methodRepo, typeRepo, symbolRepo}; | ||
| this.flushCheckpointRepos = new JfrRepository[]{stackTraceRepo, methodRepo, typeRepo, symbolRepo, oldObjectRepo}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the oldObjectRepo should come earlier in the write order. It will reference type and symbol data. Otherwise there could be races (ie. if entries are added to oldObjectRepo after symbolRepo is written to disk but before oldObjectRepo is written to disk, then there could be dangling references in the chunk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I've reordered repos.
| public void triggerUpdateOfLastKnownHeapUsage() { | ||
| // Trigger GC before tests to get a reading of last known heap usage for the first executed | ||
| // test. | ||
| System.gc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to invoke gc twice here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, once is enough :)
| return startRecording(events, getDefaultConfiguration(), settings); | ||
| } | ||
|
|
||
| static void blackhole(Object obj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The leak field is assigned to a static variable but it's not used, so at the beginning when I started working on this I was worried the compiler might optimize this away and remove all the code. The blackhole method is just a way to make sure the field is used in a trivial way, but one that cannot be optimized away because it can (extremely unlikely) end up causing a side effect (printing an empty space character).
I've just removed it and calls to it and it all works fine, so I'll go ahead and remove it.
| Object[] right = new Object[3]; | ||
| node[2] = right; | ||
| node = right; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this adding objects to this array try to make the 100 element array allocation get sampled along the allocation slow path? If so, is there a chance that it doesn't actually get sampled?
What if you directly called JfrOldObjectSampleEvents.sample(result, size.rawValue(), Integer.MIN_VALUE); from this testing code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this adding objects to this array try to make the 100 element array allocation get sampled along the allocation slow path? If so, is there a chance that it doesn't actually get sampled?
All I'm doing is generating enough arrays so that some of them get sampled as old objects and I can verify that sampled arrays make it through as events and they have the right data populated. Having the right data here means that the sampled event is for Object[] has length of 100. If no arrays were sampled the test would fail because it guarantees that the list of events generated for the Object[] is not empty.
What if you directly called JfrOldObjectSampleEvents.sample(result, size.rawValue(), Integer.MIN_VALUE); from this testing code?
Hmmm, not sure what that would do that this test is not already covering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see. I was just concerned that there isn't a guarantee the length 100 array gets sampled (although it's probably likely because we try 1M times). Although unlikely, what if the the length 3 array accidentally gets sampled because we happen to run out of space in the current TLAB, would the test fail because it's only expecting the length 100 array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will happen because the allocated size of an array of 100 elements is bigger than that of 3 elements, so you should see the priority queue logic kicking in, by pushing out arrays of 3 elements out and sampling those that have 100 elements.
| final long heapUsedAtLastGC = sample.heapUsedAtLastGC; | ||
| final int arrayLength = sample.arrayLength; | ||
| OldObjectSampleEvent.emit(timestamp, objectId, allocationTime, threadId, stackTraceId, heapUsedAtLastGC, arrayLength); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flushing thread can write serialized data to the current chunk on disk while this method is executing. If the event data gets flushed without the constant pool data in the OldObjectRepository, then there will be missing reference data. I think this scenario is possible because flushes are ignored by the OldObjectRepository. I think fixing this might be as simple as allowing flushes in the OldObjectRepository similar to the other repos. Most of the time flushes will be ignored anyway because OldObjectRepository will only have serialized data once a recording is actually stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I've changed it to allow flushes.
| final OldObject prev = samples.getSample(prevIndex); | ||
| if (prev.reference != null) { | ||
| queue.remove(prev); | ||
| prev.span += sample.span; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding span here to the previous? Is this an optimization?
| * obtained. This lock is non-reentrant so if the lock is already held by the current thread, | ||
| * the method throws a {@link RuntimeException}. | ||
| */ | ||
| @Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe this shouldn't be allowed to be in-lined since we are effectively locking without transition here (if it gets in-lined into interruptible code it also becomes interruptible). You might also need callermustbe = true ie @Uninterruptible(reason = "The whole critical section must be uninterruptible.", callerMustBe = true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class will be removed and instead use JavaSpinLockUtils.
|
Pushed a few updates that should address all of @roberttoyonaga's feedback:
|
|
Formatting, style...etc should be done now, but CI shows all the recording tests I've added failing when tested with I'm looking into this. |
|
Hmmmm, seems like in JDK 21 (or some other version post 17) the event has an extra field Compared with JDK17: Observed parsing the JFR file with @jiekang's excellent jfr-reader tool. I'm unsure how we should handle this. I'll see if I can handle it using the JDK earlier/later boolean suppliers. |
|
FYI |
|
Fixed tests and pushed commits to support JDK18+. I've ended up removing object description tests because in later JDKs generating |
roberttoyonaga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @galderz! I think you've addressed all my earlier comments. I left a few more pretty minor notes.
| profiler.sample(new WeakReference<>(String.valueOf(i)), i * 100, -1); | ||
| } | ||
|
|
||
| profiler.emit(0, Long.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure OldObjectSampler.scavenge() gets called here. I think you need to sample once more after the queue is full in order to induce a scavenge operation. I think this test still passes because OldObjectEventEmitter.emitUnchained(long) checks if the each sample is alive before emitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, I think I know what you mean, but let's refresh how scavenging has been ported to see if this makes sense:
Technically the way scavenge works in HotSpot is like this:
void ObjectSampler::add(HeapWord* obj, size_t allocated, traceid thread_id, bool virtual_thread, const JfrBlobHandle& bh, JavaThread* thread) {
...
if (Atomic::load(&_dead_samples)) {
// There's a small race where a GC scan might reset this to true, potentially
// causing a back-to-back scavenge.
Atomic::store(&_dead_samples, false);
scavenge();
}
It only gets called when sampled, but it calls it whenever a sample has been GC'd. There's no such notification mechanism in SVM, so instead I went for the option of scavenging when the queue is full.
Now, the issue I think you're implying above is that if the queue is not full, no scavenging happens and maybe it's good idea to scavenge after emitting to see if anything is dead and can be removed. This sounds like a good idea to me. It seems better than an alternative option of scavenging in each object sample add call. Have I got this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe it's good idea to scavenge after emitting to see if anything is dead and can be removed.
You may not even need to run a full scavenge() operation. When emitting in OldObjectEventEmitter.emitUnchained(long) you already check if each sample is alive. If it is dead, you can directly remove it with OldObjectSampler.remove(OldObject). Admittedly, this is only really useful if you have multiple simultaneous recordings running since we only emit the OldObjectSample event once per recording anyway (so usually there will be no point to doing the extra cleanup work during event emission).
Have I got this right?
hmm my original comment was actually concerning the test itself. I think that your current implementation of scavenging/emitting works perfectly fine. I think testScavengeMiddle, testScavengeYoungest, testScavengeOldest and testScavengeAll don't actually exercise the code in OldObjectSampler.scavenge() and instead rely on OldObjectEventEmitter.emitUnchained(long) to remove the dead samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may not even need to run a full
scavenge()operation. When emitting inOldObjectEventEmitter.emitUnchained(long)you already check if each sample is alive. If it is dead, you can directly remove it withOldObjectSampler.remove(OldObject). Admittedly, this is only really useful if you have multiple simultaneous recordings running since we only emit the OldObjectSample event once per recording anyway (so usually there will be no point to doing the extra cleanup work during event emission).
Ok, I think I'll leave the logic as is for now.
hmm my original comment was actually concerning the test itself. I think that your current implementation of scavenging/emitting works perfectly fine. I think
testScavengeMiddle,testScavengeYoungest,testScavengeOldestandtestScavengeAlldon't actually exercise the code inOldObjectSampler.scavenge()and instead rely onOldObjectEventEmitter.emitUnchained(long)to remove the dead samples.
That's right. I'll rework this to make sure the scavenging gets tested.
| // the weight of a sample that is removed should be redistributed. | ||
| // Here it gets redistributed to the sample that came just after in time. | ||
| queue.remove(prev); | ||
| prev.span += sample.span; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as how it's done in Hotspot. However, do you know why the span is added to the younger neighbor rather than the older neighbor? To me, it makes more sense to add additional weight to the older neighbor since the goal is to keep track of long-lived objects. I'm not saying the behavior should be changed, I'm just bringing it up for potential discussion because I'm curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another interesting observation which also needs some further explanation :)
Because the span is added it to the younger neighbour, that is why I implemented it as a single-linked list. In HotSpot, this is implemented as double-linked list so it could efficiently distribute the span to the older sample. I went for the simpler, more space efficient, option, taking into account that the default sample size is relatively small, so when removing things I can just loop around and fix things up.
Now, about how to distribute the span. Here's what Erik said recently when I asked him about the linked list of samples:
The weight of the object, the allocation span, in the time-based list, should be divided to its neighbour(s).
Then @tstuefe pointed out:
span is redistributed to its (in time) predecessor sample. Following your explanation, should the span not be evenly divided between predecessor and successor?
The first part is not entirely correct since prev is really the sample that was sampled after the current object, so indeed the younger one. But the point is that Erik suggest it should be divided. So, 50% to younger and 50% to older. Unfortunately Erik didn't reply to this, so I think it would make sense to follow up on hotspot-jfr-dev and see if HotSpot should change accordingly. Feel free to kick off a discussion there since I'll be away for a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response! Ok that all makes sense to me. I agree that a singly-linked is the best option to achieve the same behavior as in hotspot (only redistribute to the younger neighbor).
That's a good idea. I'll follow up on your "hotspot-jfr-dev" email thread asking "why doesn't the Hotspot implementation actually distribute the span evenly to both neighbors? And if it must distribute unevenly, why not choose the older neighbor?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the final outcome of this discussion?
What if all sampled objects are dead at some point? Then, we don't redistribute the span and queue.total() will in fall to 0. This will result in a large difference between totalAllocated and queue.total(), so new samples can easily evict old samples for a long period of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked on the hotspot mailing list here: https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2023-June/005164.html
Unfortunately, there was no response.
| } | ||
|
|
||
| @AnnotateOriginal | ||
| @Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you need it for Reference, but why do you need to add the uninterruptible annotation to this class? I have the same question about SoftReference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I don't know the exact details on the limitations around @Uninterruptible, but unless you do this native image fails, e.g.:
com.oracle.svm.core.util.VMError$HostedError: Found 1 violations of @Uninterruptible usage:
- method java.lang.ref.Reference.get():Object is annotated but java.lang.ref.PhantomReference.get():Object is not
I suspect that if classes override a method that is @Uninterruptible, e.g. PhantomReference, then they also need to mark it @Uninterruptible.
|
I've added a few more commits to address the latest feedback:
|
roberttoyonaga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new changes. It looks good to me.
|
The espresso and sulong failures seem unrelated. |
|
@christianhaeubl Any thoughts on this PR? |
11344c0 to
57a0051
Compare
|
FYI I've just updated the PR to fix the conflicts. |
57a0051 to
48df685
Compare
|
Updated the code to fix conflicts. As a result of changes in master, I moved the code to using By the way, let me know when you're happy with the code and I can squash all these commits. This would greatly simplify potential backports down the line. |
302e66f to
c903091
Compare
christianhaeubl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I reviewed most of it (except for test cases) and added questions and comments.
|
|
||
| @Override | ||
| protected final Object callNewInstanceStub(Word hub) { | ||
| protected final Object callNewInstanceStub(Word hub, UnsignedWord sizeSp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sizeSp instead of size?
| private void collectOperation(CollectionVMOperationData data) { | ||
| assert VMOperation.isGCInProgress(); | ||
| assert getCollectionEpoch().equal(data.getRequestingEpoch()); | ||
| final HeapAccounting heapAccounting = HeapImpl.getAccounting(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend to avoid the final keyword for local variables because it primarily adds verbosity without much benefit.
| Heap.getHeap().updateUsedAtGC(); | ||
| JfrGCHeapSummaryEvent.emit(JfrGCWhen.AFTER_GC); | ||
|
|
||
| UnsignedWord sizeAfter = heapAccounting.getEdenUsedBytes().add(heapAccounting.getSurvivorUsedBytes()).add(heapAccounting.getOldUsedBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be simplified to heapAccounting.getUsedBytes()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. Initially I didn't spot that survivor bytes are counted youngUsedBytes, but I see now that after in after collection eden and survivor are added together and they update youngUsedBytes.
| GenScavengeMemoryPoolMXBeans.notifyAfterCollection(); | ||
|
|
||
| printGCAfter(cause); | ||
| Heap.getHeap().updateUsedAtGC(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this store the same value as heapAccounting.getUsedBytes()? If so, then we should probably add a new field to HeapAccounting (e.g., usedAfterLastGC) that is updated in HeapAccounting.notifyAfterCollection().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this approach is in HeapAccounting, which is not accessible outside its module. I need to get the heap used after last GC from the svm.core module. This is why I added the methods directly to Heap. Any ideas around this? I doubt you want to open up HeapAccounting to access externally? getUsedAtLastGC, or a similarly named method, could remain in Heap and then define updateUsedAtGC in HeapImpl.
|
|
||
| import java.lang.ref.WeakReference; | ||
|
|
||
| public class JfrOldObjectSampleEvents { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class name confused me a bit. I don't think that this class should have Events in its name because it is only used for the sampling (and not for emitting the event).
allocatedSize should be of type UnsignedWord (if a conversion from UnsignedWord to long is necessary, we try to do it as late as possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class name confused me a bit. I don't think that this class should have Events in its name because it is only used for the sampling (and not for emitting the event).
Makes sense. I can rename it to JfrOldObjectSampler.
allocatedSize should be of type UnsignedWord (if a conversion from UnsignedWord to long is necessary, we try to do it as late as possible).
Hmmmm, this clashes a bit with my attempts to test the profiler logic in a way that it can run in JVM mode, see #6743 (comment). Is it possible to run code that uses UnsignedWord in JVM mode? By doing an early switch to long I am then able to keep the OldObjectProfiler logic runnable in both JVM and native modes.
| /* | ||
| * Register proxies for event data assertion | ||
| */ | ||
| RuntimeProxyCreation.register(Unsigned.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some more information to the comment why this is needed.
| register(methodSubstitutions, annotated, original, substitution); | ||
| } else if (keepOriginalAnnotation != null) { | ||
| register(methodSubstitutions, annotated, original, original); | ||
| AnnotateOriginal annotateOriginalAnnotation = lookupAnnotation(annotatedMethod, AnnotateOriginal.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed not needed.
| if (queue.isFull()) { | ||
| if (queue.peek().span > span) { | ||
| // Sample will not fit, try to scavenge | ||
| // HotSpot code scavenges when it gets notified that a sample was GC'd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, you could use a reference queue to get notified but I think that is too tricky to do from uninterruptible code. How does the notification work in HotSpot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That notification is done via the oop storage that we discussed. We can implement it once #7503 is in place.
|
|
||
| // Moving locking to OldObjectProfiler class cannot easily be done. | ||
| // For starters Unsafe needs to be accessible there, | ||
| // but there might be other issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For multi-line comments, we are moving more and more towards /* */-style comments to avoid formatting issues.
| final class OldObjectSampler { | ||
| private final OldObjectArray samples; | ||
| private final OldObjectPriorityQueue queue; | ||
| private final OldObjectList list; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some class-level JavaDoc why the 3 different views on the data are necessary (array, list, priority queue).
c903091 to
18a34e7
Compare
|
I pushed commits too early, I still have some comments to handle. I'll post an update when they're ready for another review. |
|
@christianhaeubl I've addressed all comments. Can you have another look? As said above, let me know when you're happy before integrating it so that I can squash all commits. Doing this makes it easier for me to backport it to another branch if necessary. |
7b4763a to
fb6b4ce
Compare
|
I've fixed the conflicts due to graal compiler package refactoring and I used the opportunity to squash the commits. |
8c20fe2 to
95c4e3e
Compare
|
Fixed another conflict and pushed. |
* Sample objects and arrays that are possible leaks. * Sampling done in allocation slow path. * It uses a custom priority queue ordered by span. * The sampler uses a fixed-sized array at runtime to keep the samples in memory. By default this is 256 elements but it's configurable via `-XX:FlightRecorderOptions`. * Samples are scavenged when the queue is full. * Checking if objects are garbage collected is done using WeakReferences. * Added JFR repository for serializing and emitting old object information. * Added plain object and array tests. * Added tests that the sample queue to be full. * Added old object description tests.
95c4e3e to
d8a5916
Compare
christianhaeubl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galderz : I am currently cleaning up this PR so that we can integrated into master. As part of that, I came across a few code parts that were unclear, so I added a few questions.
| OldObject current = list.head(); | ||
| while (current != null) { | ||
| if (current.reference != null) { | ||
| final Object obj = effects.getWeakReferent(current.reference); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- After emitting the data, do we need to clear the data structures?
- Currently, we emit the data in the order from the oldest to the youngest allocation. Is it necessary to emit the data in this order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- After emitting the data, do we need to clear the data structures?
Which data structure are you talking about exactly? The sample queue/list has to remain in place, the elements in it only get cleared when they're GC'd. So, they keep being emitted until then.
- Currently, we emit the data in the order from the oldest to the youngest allocation. Is it necessary to emit the data in this order?
I've been wondering about this and I explicitly asked about this in the jfr-dev list (see discussion).
Specifically I asked:
I'm wondering now if the fact that the list view is ordered by insertion time is relevant here. Is it maybe trying to give more importance (by increasing its span) to surviving objects that are near in time to those that have been garbage collected.
Erik replied:
The intent of the code is to remove the sample from both the time-based list and the priority queue as the object is no longer alive.
So the time aspect of the list order is acknowledged.
Now, when it comes to emitting the events, the HotSpot code uses the list order to write the events. See here. object_sampler->first() calls into _list->first(). I've tried to replicate the same logic in svm.
| @Uninterruptible(reason = "Access protected by lock.") | ||
| public boolean sample(WeakReference<Object> ref, long allocatedSize, int arrayLength) { | ||
| totalAllocated += allocatedSize; | ||
| long span = totalAllocated - queue.total(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this part: here, totalAllocated is increased no matter if the sample actually fits into the queue. So, totalAllocated will increase faster than OldObjectPriorityQueue.total. This means that span will increase over time, which makes it easier and easier for new samples to evict old samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also what Thomas Stuefe was concerned about. See mailing list message: https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2023-June/005069.html
As far as I can tell, this question wasn't given a clear answer.
| // the weight of a sample that is removed should be redistributed. | ||
| // Here it gets redistributed to the sample that came just after in time. | ||
| queue.remove(prev); | ||
| prev.span += sample.span; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the final outcome of this discussion?
What if all sampled objects are dead at some point? Then, we don't redistribute the span and queue.total() will in fall to 0. This will result in a large difference between totalAllocated and queue.total(), so new samples can easily evict old samples for a long period of time.
There has not been further replies to this discussion. I've replied in the thread to see if Erik can provide an answer. In any case, if it's really a bug, then we wait for the hotspot fix and then backport it here. That way the two impls are relatively close in their logic. |
|
@christianhaeubl I see there's a conflict in |
|
@galderz: I will handle the conflict. |
| } | ||
| // Trigger a GC so that last sweep gets updated, | ||
| // and the objects above are considered older than last GC sweep time. | ||
| System.gc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One, probably last question: what is the intention of this line? I don't think that your code accesses the last GC sweep time anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think this is leftover from previous versions of the code when I was tracking the last GC sweep time, but that is only relevant when path to GC root is on, and you set a cut off time for the samples. I'll remove it and make sure everything works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed not necessary. I've pushed a commit to remove System.gc() calls related to last sweep time.
|
@galderz , @roberttoyonaga : for integration into master, I opened #8251 which adds a larger number of cleanups and fixes smaller bugs. Feel free to review this PR. |
|
@christianhaeubl Now that #8251 is integrated, shall we close this? |
|
Yes, good point. |
Implements JFR
jdk.OldObjectSampleevent for #5145, without path to GC root functionality. I have a prototype for the path to GC root functionality, but I’ve decided to split this work into a separate PR to make reviewing more manageable.This PR includes:
-XX:FlightRecorderOptions=old-object-queue-size=<size>option.SpinLock. This new lock implementation works similarly to aVMMutexbut extends it to offer atryLock()function. This function is useful for sampling because if another object is already being sampled and the lock cannot be acquired, the sampling is simply skipped. When emitting old object sample events, the sampler waits to acquire the lock.Thread,ThreadGroupandClass. Ellipsis descriptions are also supported.Caveats:
CallingThis caveat has been addressed, see latest commits.WeakReference.get()from uninterruptible code fails. Adding@AnnotateOriginal @Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true)toTarget_java_lang_ref_Reference.get()does not seem to have any impact. Looking atAnnotationSubstitutionProcessor, classes that have@Substitutedon’t seem to get processed for methods with@AnnotateOriginal. Worked around by addingcalleeMustBe = falseto theUninterruptiblemethod within whichWeakReference.get()is called.sizefield not implemented since they’re not feasible in a closed world environment. In HotSpot this is achieved by using reflection to lookup asizefield but that cannot be done in SubstrateVM.Threadinstances. Classes not tested since it requires dynamically creating brand newClassinstances at runtime, something not feasible in closed-world SubstrateVM.Once this PR is integrated I'll work on adding path to GC root functionality.