Skip to content

Conversation

roberttoyonaga
Copy link
Collaborator

@roberttoyonaga roberttoyonaga commented Apr 4, 2023

Summary of Changes
This PR adds serialization of JFR stacktraces at flushpoints. It uses a similar approach to how JFR local buffers are processed at flushpoints.

  • Since the SamplerBuffers and JfrBuffers are handled similarly, I've also introduced Buffer, BufferNode, and BufferList classes and done some refactoring to try and reduce code duplication.

  • I've also moved most of the stacktrace serialization code into the JfrStacktraceRepository. This makes is simpler to make the processing atomic.

  • serializedPos pointer was introduced in SamplerBuffer to avoid races between processing the active buffers and writing to them.

  • Doing safepoint checks was removed from SamplerBuffersAccess.processFullBuffers() because now the JfrChunkwriter lock must be held while doing the processing.

Please see related issue here: #6226

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 4, 2023
@roberttoyonaga roberttoyonaga force-pushed the event-streaming-follow-up branch from 60123e0 to cf8c486 Compare April 4, 2023 18:35
@oubidar-Abderrahim
Copy link
Member

Hi @roberttoyonaga, Thank you for opening this PR
@christianhaeubl could you please review this PR or assign someone to do it? Thank you.

Copy link
Member

@christianhaeubl christianhaeubl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work, I added a few comments.

*/
@Uninterruptible(reason = "Prevent JFR recording.")
private static void processSamplerBuffers(boolean flushpoint) {
SamplerBuffersAccess.processActiveBuffers(flushpoint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed the calls to JfrExecutionSampler.singleton().disallowThreadsInSamplerCode() and JfrExecutionSampler.singleton().allowThreadsInSamplerCode(). So, the SIGPROF-based sampler can interrupt processActiveBuffers and processFullBuffers at any time. When the signal handler modifies the stacks of available and full buffers, this can cause issues that look like races (even though everything happens in the same thread).

See the JavaDoc on processSamplerBuffers().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, the sampler could also affect the buffer stacks. I've added JfrExecutionSampler.singleton().disallowThreadsInSamplerCode() and JfrExecutionSampler.singleton().allowThreadsInSamplerCode() back in, and I've corrected the Javadoc as well.

BufferNode oldNode = oldBuffer.getNode();

// Lock here to avoid race with flushing thread which might be processing
BufferNodeAccess.lockNoTransition(oldNode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think that we need any BufferNode-based locking for SamplerBuffers:

  • the buffers have a fixed size and are never resized
  • in-use/full buffers are only processed by threads that hold the JfrChunkWriter lock
  • only the SamplerBufferPool frees buffers

So, the following should be safe:

  • a flushing thread can access the data below the committed position without any locking
  • SamplerSampleWriter.accommodate(...) may access the uncommitted data without any locking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I agree. I've removed the node-level locking for the SamplerBuffers

}

@Uninterruptible(reason = "Called from uninterruptible code.", mayBeInlined = true)
public static SamplerBuffer getSamplerBuffer(BufferNode node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note: in the long run, we should probably try to get rid of this special buffer. This should be possible if we improve the JfrBuffer/JfrBufferNode/JfrBufferList infrastructure a bit.

@Override
@Uninterruptible(reason = "Locking without transition requires that the whole critical section is uninterruptible.")
public BufferNode addNode(com.oracle.svm.core.jfr.Buffer buffer) {
assert !AbstractJfrExecutionSampler.isExecutionSamplingAllowedInCurrentThread();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the changes.

In the last review, I missed that this is called from the signal handler. This is not allowed because in the signal handler, we can't allocate any memory (i.e., not even C memory).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok I see. I've changed things now so that the BufferNode allocations are done by SamplerBufferPool at the same time SamplerBuffer allocations are done. Now, buffers acquired from the "available" stack already have corresponding nodes, which are ready to be linked onto SamplerBufferList when the buffers become active. When full buffers are released after being serialized, new BufferNodes are allocated for them before they go back into circulation on the available list.

@roberttoyonaga roberttoyonaga force-pushed the event-streaming-follow-up branch from 2a5d0d8 to 038668b Compare June 22, 2023 18:00
@roberttoyonaga roberttoyonaga force-pushed the event-streaming-follow-up branch from 038668b to f94a07a Compare June 26, 2023 15:19
@roberttoyonaga
Copy link
Collaborator Author

Hi @christianhaeubl, when you have time, can you please have another look at this?

@roberttoyonaga
Copy link
Collaborator Author

Hi @christianhaeubl, just commenting here to keep this on your radar.

@roberttoyonaga roberttoyonaga force-pushed the event-streaming-follow-up branch from 919a533 to 2f7610a Compare November 1, 2023 20:37
@roberttoyonaga
Copy link
Collaborator Author

hi @christianhaeubl can you please have another look at this one, when you get the chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants