Skip to content

8365306: Provide OS Process Size and Libc statistic metrics to JFR #26756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Aug 13, 2025

This provides the following new metrics:

  • ProcessSize event (new, periodic)
    • vsize (for analyzing address-space fragmentation issues)
    • RSS including subtypes (subtypes are useful for excluding atypical issues, e.g. kernel problems that cause large file buffer bloat)
    • peak RSS
    • process swap (if we swap we cannot trust the RSS values, plus it indicates bad sizing)
    • pte size (to quickly see if we run with a super-large working set but an unsuitably small page size)
  • LibcStatistics (new, periodic)
    • outstanding malloc size (important counterpoint to whatever NMT tries to tell me, which alone is often misleading)
    • retained malloc size (super-important for the same reason)
    • number of libc trims the hotspot executed (needed to gauge the usefulness of the retain counter, and to see if a customer employs native heap auto trimming (-XX:TrimNativeHeapInterval)
  • NativeHeapTrim (new, event-driven) (for both manual and automatic trims)
    • RSS before and RSS after
    • RSS recovered by this trim
    • whether it was an automatic or manual trim
    • duration
  • JavaThreadStatistic
    • os thread counter (new field) (useful to understand the behavior of third-party code in our process if threads are created that bypass the JVM. E.g. some custom launchers do that.)
    • nonJava thread counter (new field) (needed to interprete the os thread counter)

Notes:

  • we already have ResidentSetSize event, and the new ProcessSize event is a superset of that. I don't know how these cases are handled. I'd prefer to throw the old event out, but JMC has a hard-coded chart for RSS, so I kept it in unless someone tells me to remove it.

  • Obviously, the libc events are very platform-specific. Still, I argue that these metrics are highly useful. We want people to use JFR and JMC; people include developers that are dealing with performance problems that require platform-specific knowledge to understand. See my comment in the JBS issue.

I provided implementations, as far as possible, to Linux, MacOS and Windows.

Testing:

  • ran the new tests manually and as part of GHAs

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8365306: Provide OS Process Size and Libc statistic metrics to JFR (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26756/head:pull/26756
$ git checkout pull/26756

Update a local copy of the PR:
$ git checkout pull/26756
$ git pull https://git.openjdk.org/jdk.git pull/26756/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26756

View PR using the GUI difftool:
$ git pr show -t 26756

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26756.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 13, 2025

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 13, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Aug 13, 2025

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@tstuefe tstuefe force-pushed the JDK-8365306-Provide-OS-Process-Size-and-Libc-statistic-metrics-to-JFR branch 3 times, most recently from 4828a1b to b24c784 Compare August 15, 2025 12:14
@tstuefe tstuefe force-pushed the JDK-8365306-Provide-OS-Process-Size-and-Libc-statistic-metrics-to-JFR branch from b24c784 to 39e282a Compare August 15, 2025 15:06
@tstuefe tstuefe marked this pull request as ready for review August 16, 2025 04:16
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 16, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 16, 2025

Webrevs

@tstuefe
Copy link
Member Author

tstuefe commented Aug 17, 2025

label /hotspot-jfr

@stefank
Copy link
Member

stefank commented Aug 18, 2025

/label hotspot-jfr

@openjdk
Copy link

openjdk bot commented Aug 18, 2025

@stefank
The hotspot-jfr label was successfully added.

@egahlin
Copy link
Member

egahlin commented Aug 19, 2025

What is the problem with adding RSS metrics to the existing ResidentSetSize event?

@tstuefe
Copy link
Member Author

tstuefe commented Aug 19, 2025

What is the problem with adding RSS metrics to the existing ResidentSetSize event?

You mean adding the vsize, swap etc fields to ResidentSetSize?

I thought about that, but then it would be weirdly misnamed. RSS has a very specific meaning. So we would have ResidentSetSize.vsize, ResidentSetSize.swap, ResidentSetSize.rss (?)

I also thought about splitting them up and add one event per value, following the "ResidentSetSize" pattern. So, one event for "VirtualSize", one for "Swap" etc. Apart from not liking the fine granularity, having these fields grouped in one event has multiple advantages. Mainly, I can build myself graphs in JMC for all these fields in one graph and correlate all the values. It is also cheaper to obtain (just one parsing operation for /proc/meminfo, for instance).

@egahlin
Copy link
Member

egahlin commented Aug 19, 2025

You mean adding the vsize, swap etc fields to ResidentSetSize?

I thought about that, but then it would be weirdly misnamed. RSS has a very specific meaning. So we would have ResidentSetSize.vsize, ResidentSetSize.swap, ResidentSetSize.rss (?)

I was thinking something like this:

<Event name="ResidentSetSize" category="Java Virtual Machine, Memory" ... >
  <Field type="ulong" contentType="bytes" name="size" .../>
  <Field type="ulong" contentType="bytes" name="peak" ..  />
  <Field type="ulong" contentType="bytes" name="anonymous"   />
  <Field type="ulong" contentType="bytes" name="file" />
  <Field type="ulong" contentType="bytes" name="sharedMemory"  />
</Event>

When it comes to non-rss metrics, there is a Swap event, but not sure it is appropriate? Regarding other metrics, perhaps they should go into other events, or perhaps new events should be created. I haven't had time to look into it.

Mainly, I can build myself graphs in JMC for all these fields in one graph and correlate all the values.

Periodic events can be emitted with the same timestamp. That way they can be grouped together. This is what we do with NativeMemoryUsage and NativeMemoryTotal.

These events will likely be around for a long time. We shouldn't design them just to match the workflow of one tool as it currently works. New functionality can be added in the future.

It is also cheaper to obtain (just one parsing operation for /proc/meminfo, for instance).

We should not sacrifice the design unless the overhead is significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants