Skip to content

Conversation

@minborg
Copy link
Contributor

@minborg minborg commented Feb 25, 2025

As we advance, converting older JDK code to use the relatively new FFM API requires system calls that can provide errno and the likes to explicitly allocate a MemorySegment to capture potential error states. This can lead to negative performance implications if not designed carefully and also introduces unnecessary code complexity.

Hence, this PR proposes adding a JDK internal method handle adapter that can be used to handle system calls with errno, GetLastError, and WSAGetLastError.

It relies on an efficient carrier-thread-local cache of memory regions to allide allocations.

Here are some benchmarks that ran on a platform thread and virtual threads respectively (M1 Mac):

Benchmark                                                  Mode  Cnt   Score   Error  Units
CaptureStateUtilBench.OfVirtual.adaptedSysCallFail         avgt   30  24.330 ? 0.820  ns/op
CaptureStateUtilBench.OfVirtual.adaptedSysCallSuccess      avgt   30   8.257 ? 0.117  ns/op
CaptureStateUtilBench.OfVirtual.explicitAllocationFail     avgt   30  41.415 ? 1.013  ns/op
CaptureStateUtilBench.OfVirtual.explicitAllocationSuccess  avgt   30  21.720 ? 0.463  ns/op
CaptureStateUtilBench.OfVirtual.tlAllocationFail           avgt   30  23.636 ? 0.182  ns/op
CaptureStateUtilBench.OfVirtual.tlAllocationSuccess        avgt   30   8.234 ? 0.156  ns/op
CaptureStateUtilBench.adaptedSysCallFail                   avgt   30  23.918 ? 0.487  ns/op
CaptureStateUtilBench.adaptedSysCallSuccess                avgt   30   4.946 ? 0.089  ns/op
CaptureStateUtilBench.explicitAllocationFail               avgt   30  42.280 ? 1.128  ns/op
CaptureStateUtilBench.explicitAllocationSuccess            avgt   30  21.809 ? 0.413  ns/op
CaptureStateUtilBench.tlAllocationFail                     avgt   30  24.422 ? 0.673  ns/op
CaptureStateUtilBench.tlAllocationSuccess                  avgt   30   5.182 ? 0.152  ns/op

Adapted system call:

        return (int) ADAPTED_HANDLE.invoke(0, 0); // Uses a MH-internal pool

Explicit allocation:

        try (var arena = Arena.ofConfined()) {
            return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
        }

Thread Local allocation:

        try (var arena = POOLS.take()) {
            return (int) HANDLE.invoke(arena.allocate(4), 0, 0); // Uses a manually specified pool
        }

The adapted system call exhibits a ~4x performance improvement over the existing "explicit allocation" scheme for the happy path on platform threads. Because there needs to be sharing across threads for virtual-thread-capable carrier threads, these are a bit slower ("only" ~3x faster).

image

Here are some benchmarks for the underlying ArenaPool (M1 Mac):

Benchmark                                   (ELEM_SIZE)  Mode  Cnt   Score    Error  Units
ArenaPoolBench.OfVirtual.confined                     4  avgt   30  23.543 ?  0.168  ns/op
ArenaPoolBench.OfVirtual.confined                    64  avgt   30  27.384 ?  0.167  ns/op
ArenaPoolBench.OfVirtual.confined2                    4  avgt   30  47.811 ?  0.220  ns/op
ArenaPoolBench.OfVirtual.confined2                   64  avgt   30  55.404 ?  0.286  ns/op
ArenaPoolBench.OfVirtual.pooled                       4  avgt   30   8.210 ?  0.043  ns/op
ArenaPoolBench.OfVirtual.pooled                      64  avgt   30  45.525 ? 52.525  ns/op
ArenaPoolBench.OfVirtual.pooled2                      4  avgt   30  50.670 ?  0.778  ns/op
ArenaPoolBench.OfVirtual.pooled2                     64  avgt   30  85.846 ?  2.304  ns/op
ArenaPoolBench.confined                               4  avgt   30  23.286 ?  0.184  ns/op
ArenaPoolBench.confined                              64  avgt   30  27.026 ?  0.111  ns/op
ArenaPoolBench.confined2                              4  avgt   30  48.301 ?  0.942  ns/op
ArenaPoolBench.confined2                             64  avgt   30  57.512 ?  5.373  ns/op
ArenaPoolBench.pooled                                 4  avgt   30   5.085 ?  0.048  ns/op
ArenaPoolBench.pooled                                64  avgt   30  29.621 ?  0.440  ns/op
ArenaPoolBench.pooled2                                4  avgt   30  10.610 ?  0.339  ns/op
ArenaPoolBench.pooled2                               64  avgt   30  60.815 ?  1.046  ns/op
ArenaPoolFromBench.OfVirtual.confinedInt            N/A  avgt   30  21.944 ?  0.122  ns/op
ArenaPoolFromBench.OfVirtual.confinedSting          N/A  avgt   30  26.190 ?  0.193  ns/op
ArenaPoolFromBench.OfVirtual.pooledInt              N/A  avgt   30   8.217 ?  0.043  ns/op
ArenaPoolFromBench.OfVirtual.pooledString           N/A  avgt   30   9.271 ?  0.056  ns/op
ArenaPoolFromBench.confinedInt                      N/A  avgt   30  21.892 ?  0.139  ns/op
ArenaPoolFromBench.confinedSting                    N/A  avgt   30  26.012 ?  0.058  ns/op
ArenaPoolFromBench.pooledInt                        N/A  avgt   30   5.056 ?  0.034  ns/op
ArenaPoolFromBench.pooledString                     N/A  avgt   30   6.419 ?  0.037  ns/op

Note: The pool size for the above benchmarks was 32 bytes.

This PR relates to #23391 we had to back out. This PR attempts to ensure, that the problems encountered there do not surface in this PR.

The arena pool is able to share recyclable memory across several arenas, for platform threads.

This PR passes tier1, tier2, and tier3 testing.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 2 Reviewers)

Issue

  • JDK-8347408: Create an internal method handle adapter for system calls with errno (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23765/head:pull/23765
$ git checkout pull/23765

Update a local copy of the PR:
$ git checkout pull/23765
$ git pull https://git.openjdk.org/jdk.git pull/23765/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23765

View PR using the GUI difftool:
$ git pr show -t 23765

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23765.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 25, 2025

👋 Welcome back pminborg! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@minborg
Copy link
Contributor Author

minborg commented Feb 25, 2025

/reviewers 2 reviewer

@openjdk
Copy link

openjdk bot commented Feb 25, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Feb 25, 2025

@minborg
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

@openjdk
Copy link

openjdk bot commented Feb 25, 2025

@minborg The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

// Do not use a lambda in order to allow early use in the init sequence
// This is equivalent to:
// computeIfAbsent(basicKey, CaptureStateUtil::basicHandleFor);
.computeIfAbsent(basicKey, new Function<>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend a local record and capture the record instance in a member static final field. This code creates a function on every call. Also might be of interest whether we should use get + putIfAbsent or computeIfAbsent, as CHM has some bug that makes cIA slower than get for certain access patterns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is not critical to this method so I would prioritize maintainability over performance here.

@minborg minborg marked this pull request as ready for review February 26, 2025 12:54
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 26, 2025
@mlbridge
Copy link

mlbridge bot commented Feb 26, 2025

Webrevs

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 26, 2025

@minborg This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 23, 2025

@minborg This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Apr 23, 2025
@minborg
Copy link
Contributor Author

minborg commented Apr 23, 2025

/open

@openjdk openjdk bot reopened this Apr 23, 2025
@openjdk
Copy link

openjdk bot commented Apr 23, 2025

@minborg This pull request is now open

@liach
Copy link
Member

liach commented Apr 29, 2025

The code looks good to me; please wait for people more professional with FFM internals like @JornVernee or @mcimadamore.

Comment on lines +724 to 726
return this instanceof NoInitSegmentAllocator noInit ?
noInit.allocateNoInit(byteSize, 1) :
allocate(byteSize);
Copy link
Contributor

@wenshao wenshao May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return this instanceof NoInitSegmentAllocator noInit ?
noInit.allocateNoInit(byteSize, 1) :
allocate(byteSize);
return this instanceof NoInitSegmentAllocator noInit
? noInit.allocateNoInit(byteSize, 1)
: allocate(byteSize);

How about a different coding style?

@minborg
Copy link
Contributor Author

minborg commented May 5, 2025

I will close this PR and create a new one against the bug report using the newly integrated BufferStack and Stable Values.

@minborg minborg closed this May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants