fixes #1616 tighten up synchronization and conditional logic #1688

brettwooldridge · 2025-09-26T21:08:09Z

Cleaner Revenge Tour 😄

Ok, this refactor leaves the doubly linked-list intact but tightens up the conditional logic around the node linking/unlinking, as well as reducing the method invocation overhead in some code paths. All-in-all an extremely minor refactor but seemingly does well in the JMH harness.

Tests run on a M1 Ultra Mac Studio.

High-memory high-threading (2x core count):

java -Xmx20g -jar target/benchmarks.jar -t 16 -i 5 -wi 6

5.18.0
Benchmark                Mode  Cnt       Score        Error  Units
MyBenchmark.testMethod  thrpt   25  550361.312 ± 452304.324  ops/s

5.19.0-SNAPSHOT
Benchmark                Mode  Cnt       Score        Error  Units
MyBenchmark.testMethod  thrpt   25  806701.763 ± 548298.436  ops/s

Lower-memory, threads matching core count:

java -Xmx16g -jar target/benchmarks.jar -t 8 -i 5 -wi 6

5.18.0
Benchmark                Mode  Cnt       Score        Error  Units
MyBenchmark.testMethod  thrpt   25  216233.155 ± 209346.033  ops/s

5.19.0-SNAPSHOT
Benchmark                Mode  Cnt       Score        Error  Units
MyBenchmark.testMethod  thrpt   25  313306.938 ± 324647.767  ops/s

Tests run on an Epyc 7402 Proxmox VM

High-memory high-threading (2x core count):

java -Xmx20g -jar target/benchmarks.jar -t 16 -i 5 -wi 6

5.18.0
Benchmark                Mode  Cnt        Score       Error  Units
MyBenchmark.testMethod  thrpt   25  1003122.171 ± 88700.984  ops/s

5.19.0-SNAPSHOT
Benchmark                Mode  Cnt        Score        Error  Units
MyBenchmark.testMethod  thrpt   25  1057712.282 ± 105237.902  ops/s

Lower-memory, threads matching core count:

java -Xmx16g -jar target/benchmarks.jar -t 8 -i 5 -wi 6

5.18.0
Benchmark                Mode  Cnt       Score       Error  Units
MyBenchmark.testMethod  thrpt   25   948151.717 ± 50740.821  ops/s

5.19.0-SNAPSHOT
Benchmark                Mode  Cnt        Score       Error  Units
MyBenchmark.testMethod  thrpt   25  1020379.664 ± 82178.334  ops/s

matthiasblaesing

I can reproduce improved numbers with the provided jmh invocations (thanks for that). I left one inline comment. Could please check if you agree and see if the numbers still hold up if a fix is applied?

matthiasblaesing · 2025-10-20T19:43:41Z

src/com/sun/jna/internal/Cleaner.java

+            cleanerRunning = true;
        }
+
+        return ref;


At this point we need the equivalent of Reference.reachabilityFence on obj. If the caller does not retain a strong reference, we need to ensure, that the reference is kept at least until the reference cleaner is completely enqueued. As observed in the last iteration early GC can happen: #1684 (comment).

In the comment I suggested to use an empty sychronized block.

Ah, I don't know how I lost the change I had made, but I did somehow. The way that I intended to solve it, and had at some point but must have reverted it, was to insert this before the return ref:

// Ensure that obj is referencable past the enqueue point. if (obj == null) { throw new IllegalArgumentException("Cleaner object cannot be null"); }

I believe that should address it. Can you test it?

If that does not fix it, remove synchronized from the method signature, and do this:

public Cleanable register(final Object obj, final Runnable cleanupTask) { // The important side effect is the PhantomReference, that is yielded after the referent is GCed final CleanerRef ref = new CleanerRef(obj, referenceQueue, cleanupTask); synchronized (this) { // everything else ... } // Ensure that obj is referencable past the enqueue point. if (obj == null) { throw new IllegalArgumentException("Cleaner object cannot be null"); } return ref; }

EDIT: I updated the pull request with this later fix.

I believe this change should work. The compiler cannot reorder the conditional before the synchronized block, and it cannot eliminate the reference bc it cannot know whether it is null or not.

By the way, really nice work on this class. Outside of this change, I simply don't see any way it could possibly be more efficient. I love seeing code like this.

matthiasblaesing · 2025-10-26T07:34:25Z

Thanks for the update. I run a few measurements based on you suggested invocation. And these are the results:

measure.ods

For "Tabelle 1" I ran 5.18.1 as baseline and your branch as comparison (title is "modified" then). Looking at the Diff I see mostly positive effects for the first run variant, but even there I see one run that regressed. The second run variant allways comes out slower. This also holds when averaging the values.

"Tabelle 2" is mostly identical to the first experiment, but I introduced the "check non null" in register onto master (that is master-synced) and ran that for comparison.

It is hard to draw a conclusion from this. My runtime environment is a bad example though (Notebook).

Could you please have a look and maybe rerun your numbers?

…nal logic

brettwooldridge · 2025-11-04T06:43:17Z

@matthiasblaesing Sorry for taking so long to get back to this. And apologies for this long post.

I ran lots of tests. Lots. And the more I ran, the more confusing things became. At a high level, with larger runs, the new code consistently benchmarks roughly 10% lower than the existing code. And therein lies a mystery.

Studying the code, from a logical perspective, it cannot possibly be slower:

It acquires fewer locks.
It contains fewer conditionals.
It contains fewer method dispatches.

Even at a bytecode level the new code "wins". And yet.

And yet, according to JMH, it turns in lower ops/s. This is even benchmarking against a master branch that includes the code to ensure the reference is maintained past the linking of the phantom reference:

   // Ensure that obj is referencable past the enqueue point.
   if (obj == null) {
      throw new IllegalArgumentException("Cleaner object cannot be null");
   }

First Revelation

On a whim, I cranked up the JMH iterations per fork and I noticed something interesting:

# Run progress: 40.00% complete, ETA 00:17:22
# Fork: 3 of 5
# Warmup Iteration   1: 1701444.457 ops/s
# Warmup Iteration   2: 2240284.714 ops/s
# Warmup Iteration   3: 2240826.902 ops/s
# Warmup Iteration   4: 2214448.058 ops/s
# Warmup Iteration   5: 2206987.306 ops/s
# Warmup Iteration   6: 2269375.607 ops/s
Iteration   1: 2242599.580 ops/s
Iteration   2: 397205.504 ops/s
Iteration   3: 1770953.479 ops/s
Iteration   4: 385594.600 ops/s
Iteration   5: 264045.054 ops/s
Iteration   6: 42915.212 ops/s
Iteration   7: 24755.365 ops/s
Iteration   8: 6410.848 ops/s
Iteration   9: 26682.966 ops/s
Iteration  10: 2249.819 ops/s
Iteration  11: 16611.778 ops/s

We're looking at a drop from over 2 million ops/s early in the run to as low as 6000 ops/s. This occurs in both the master branch and the proposed change.

First, it should be noted that if the test is started with a smaller heap, for example -Xmx8g instead of -Xmg20g, the test very quickly falls over with an OutOfMemory exception. Even with a larger heap, if the iterations are cranked up, the result is the same. This was a clue.

Why is this occurring? Well, we have N number of threads creating and registering objects, and only one thread cleaning them up. In addition, registering objects is "cheap" while cleaning them up is "expensive" in relative terms.

Second Revelation

JMH is measuring one side of the system -- registering objects. But the other side, cleaning references, is unseen and unmeasured.

In the benchmark, if we look at the lock itself, what we have are N threads contending for the lock plus the cleaner thread also contending for that lock. Assuming roughly fair queuing by the scheduler, the cleaner thread is going to lose most of the time -- it's 1 thread vs. N threads in terms of who is going to win the lock acquisition.

Hypothesis

The reason that this pull request benchmarks lower (~10%) is that the cleaner thread is "winning" more, at the expense of the N threads that are trying to register objects. But again, the benchmark is only measuring the registering side.

If the predicates above...

It acquires fewer locks.
It contains fewer conditionals.
It contains fewer method dispatches.

... are accepted, this is an obvious conclusion. There is no other reason the master registration code would be faster otherwise.

It should be noted again that both the master branch and this pull request both degenerate into four-digit ops/s if the memory is more constrained or the test iterations are increased.

Where are we?

I don't really see a simple method of measuring the throughput of the entire system -- the registration side and the cleaning side. Especially because this would require triggering GC deterministically in order to force the cleaner to run, while at the same time artificially constraining the N registration threads in such a way that the cleaner is allowed to "keep up", in order to balance object creation and retirement (and avoid OOM).

I would argue that given the closeness in performance of measuring even just one side of the system, the greater clarity and simplicity of the code in the pull request "wins" in a pragmatic sense.

And I would argue, without evidence, that the throughput of the entire system is obviously higher -- because the only explanation that makes sense for the master code to be faster in "registration" is due to increased crowding out of the cleaner thread in lock acquisition.

I'm not sure where to go from here. I really can't invest more time in doing something like constructing some kind of harness that measures the total throughput of the system while ensuring that the cleaner is not overrun by registration spamming.

I don't think JMH is meaningful here if we are only measuring the registration side of the equation.

matthiasblaesing reviewed Oct 20, 2025

View reviewed changes

brettwooldridge added 2 commits October 26, 2025 23:18

fixes java-native-access#1616 tighten up synchronization and conditio…

9002b63

…nal logic

ensure that object is referenceable past the enqueue point

5fac28f

brettwooldridge force-pushed the fix-1616 branch from 606925b to 5fac28f Compare October 26, 2025 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixes #1616 tighten up synchronization and conditional logic #1688

fixes #1616 tighten up synchronization and conditional logic #1688

brettwooldridge commented Sep 26, 2025 •

edited

Loading

Uh oh!

matthiasblaesing left a comment

Uh oh!

matthiasblaesing Oct 20, 2025

Uh oh!

brettwooldridge Oct 24, 2025 •

edited

Loading

Uh oh!

brettwooldridge Oct 24, 2025 •

edited

Loading

Uh oh!

matthiasblaesing commented Oct 26, 2025

Uh oh!

brettwooldridge commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fixes #1616 tighten up synchronization and conditional logic #1688

Are you sure you want to change the base?

fixes #1616 tighten up synchronization and conditional logic #1688

Conversation

brettwooldridge commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests run on a M1 Ultra Mac Studio.

Tests run on an Epyc 7402 Proxmox VM

Uh oh!

matthiasblaesing left a comment

Choose a reason for hiding this comment

Uh oh!

matthiasblaesing Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

brettwooldridge Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brettwooldridge Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthiasblaesing commented Oct 26, 2025

Uh oh!

brettwooldridge commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

First Revelation

Second Revelation

Hypothesis

Where are we?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brettwooldridge commented Sep 26, 2025 •

edited

Loading

brettwooldridge Oct 24, 2025 •

edited

Loading

brettwooldridge Oct 24, 2025 •

edited

Loading

brettwooldridge commented Nov 4, 2025 •

edited

Loading