-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Serialize JFR stack traces during flushpoints #6365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Serialize JFR stack traces during flushpoints #6365
Conversation
gate fixes style
60123e0
to
cf8c486
Compare
Hi @roberttoyonaga, Thank you for opening this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work, I added a few comments.
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/BufferNodeAccess.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrBufferList.java
Outdated
Show resolved
Hide resolved
*/ | ||
@Uninterruptible(reason = "Prevent JFR recording.") | ||
private static void processSamplerBuffers(boolean flushpoint) { | ||
SamplerBuffersAccess.processActiveBuffers(flushpoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You removed the calls to JfrExecutionSampler.singleton().disallowThreadsInSamplerCode()
and JfrExecutionSampler.singleton().allowThreadsInSamplerCode()
. So, the SIGPROF
-based sampler can interrupt processActiveBuffers
and processFullBuffers
at any time. When the signal handler modifies the stacks of available and full buffers, this can cause issues that look like races (even though everything happens in the same thread).
See the JavaDoc on processSamplerBuffers()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, the sampler could also affect the buffer stacks. I've added JfrExecutionSampler.singleton().disallowThreadsInSamplerCode()
and JfrExecutionSampler.singleton().allowThreadsInSamplerCode()
back in, and I've corrected the Javadoc as well.
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/sampler/SamplerBuffersAccess.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/sampler/SamplerBufferList.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/BufferNodeAccess.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrStackTraceRepository.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrThreadLocal.java
Outdated
Show resolved
Hide resolved
BufferNode oldNode = oldBuffer.getNode(); | ||
|
||
// Lock here to avoid race with flushing thread which might be processing | ||
BufferNodeAccess.lockNoTransition(oldNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think that we need any BufferNode
-based locking for SamplerBuffer
s:
- the buffers have a fixed size and are never resized
- in-use/full buffers are only processed by threads that hold the
JfrChunkWriter
lock - only the
SamplerBufferPool
frees buffers
So, the following should be safe:
- a flushing thread can access the data below the committed position without any locking
SamplerSampleWriter.accommodate(...)
may access the uncommitted data without any locking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I agree. I've removed the node-level locking for the SamplerBuffers
} | ||
|
||
@Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true) | ||
public static SamplerBuffer getSamplerBuffer(BufferNode node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note: in the long run, we should probably try to get rid of this special buffer. This should be possible if we improve the JfrBuffer
/JfrBufferNode
/JfrBufferList
infrastructure a bit.
@Override | ||
@Uninterruptible(reason = "Locking without transition requires that the whole critical section is uninterruptible.") | ||
public BufferNode addNode(com.oracle.svm.core.jfr.Buffer buffer) { | ||
assert !AbstractJfrExecutionSampler.isExecutionSamplingAllowedInCurrentThread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the changes.
In the last review, I missed that this is called from the signal handler. This is not allowed because in the signal handler, we can't allocate any memory (i.e., not even C memory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok I see. I've changed things now so that the BufferNode
allocations are done by SamplerBufferPool
at the same time SamplerBuffer
allocations are done. Now, buffers acquired from the "available" stack already have corresponding nodes, which are ready to be linked onto SamplerBufferList
when the buffers become active. When full buffers are released after being serialized, new BufferNode
s are allocated for them before they go back into circulation on the available list.
2a5d0d8
to
038668b
Compare
038668b
to
f94a07a
Compare
Hi @christianhaeubl, when you have time, can you please have another look at this? |
Hi @christianhaeubl, just commenting here to keep this on your radar. |
919a533
to
2f7610a
Compare
hi @christianhaeubl can you please have another look at this one, when you get the chance? |
Summary of Changes
This PR adds serialization of JFR stacktraces at flushpoints. It uses a similar approach to how JFR local buffers are processed at flushpoints.
Since the
SamplerBuffer
s andJfrBuffer
s are handled similarly, I've also introducedBuffer
,BufferNode
, andBufferList
classes and done some refactoring to try and reduce code duplication.I've also moved most of the stacktrace serialization code into the
JfrStacktraceRepository
. This makes is simpler to make the processing atomic.serializedPos
pointer was introduced inSamplerBuffer
to avoid races between processing the active buffers and writing to them.Doing safepoint checks was removed from
SamplerBuffersAccess.processFullBuffers()
because now theJfrChunkwriter
lock must be held while doing the processing.Please see related issue here: #6226