8346236: Auto vectorization support for various Float16 operations #22755

jatin-bhateja · 2024-12-15T20:06:40Z

This is a follow-up PR for #22754

The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma).

Summary of changes included with the patch:

C2 compiler New Vector IR creation.
Auto-vectorization support.
x86 backend implementation.
New IR verification test for each newly supported vector operation.

Following are the performance numbers of Float16OperationsBenchmark

System : Intel(R) Xeon(R) Processor code-named Granite rapids
Frequency fixed at 2.5 GHz

Baseline
Benchmark                                                      (vectorDim)   Mode  Cnt     Score   Error   Units
Float16OperationsBenchmark.absBenchmark                               1024  thrpt    2  4191.787          ops/ms
Float16OperationsBenchmark.addBenchmark                               1024  thrpt    2  1211.978          ops/ms
Float16OperationsBenchmark.cosineSimilarityDequantizedFP16            1024  thrpt    2   493.026          ops/ms
Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16         1024  thrpt    2   612.430          ops/ms
Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16         1024  thrpt    2   616.012          ops/ms
Float16OperationsBenchmark.divBenchmark                               1024  thrpt    2   604.882          ops/ms
Float16OperationsBenchmark.dotProductFP16                             1024  thrpt    2   410.798          ops/ms
Float16OperationsBenchmark.euclideanDistanceDequantizedFP16           1024  thrpt    2   602.863          ops/ms
Float16OperationsBenchmark.euclideanDistanceFP16                      1024  thrpt    2   640.348          ops/ms
Float16OperationsBenchmark.fmaBenchmark                               1024  thrpt    2   809.175          ops/ms
Float16OperationsBenchmark.getExponentBenchmark                       1024  thrpt    2  2682.764          ops/ms
Float16OperationsBenchmark.isFiniteBenchmark                          1024  thrpt    2  3373.901          ops/ms
Float16OperationsBenchmark.isFiniteCMovBenchmark                      1024  thrpt    2  1881.652          ops/ms
Float16OperationsBenchmark.isFiniteStoreBenchmark                     1024  thrpt    2  2273.745          ops/ms
Float16OperationsBenchmark.isInfiniteBenchmark                        1024  thrpt    2  2147.913          ops/ms
Float16OperationsBenchmark.isInfiniteCMovBenchmark                    1024  thrpt    2  1962.579          ops/ms
Float16OperationsBenchmark.isInfiniteStoreBenchmark                   1024  thrpt    2  1696.494          ops/ms
Float16OperationsBenchmark.isNaNBenchmark                             1024  thrpt    2  2417.396          ops/ms
Float16OperationsBenchmark.isNaNCMovBenchmark                         1024  thrpt    2  1708.585          ops/ms
Float16OperationsBenchmark.isNaNStoreBenchmark                        1024  thrpt    2  2055.511          ops/ms
Float16OperationsBenchmark.maxBenchmark                               1024  thrpt    2  1211.940          ops/ms
Float16OperationsBenchmark.minBenchmark                               1024  thrpt    2  1212.063          ops/ms
Float16OperationsBenchmark.mulBenchmark                               1024  thrpt    2  1211.955          ops/ms
Float16OperationsBenchmark.negateBenchmark                            1024  thrpt    2  4215.922          ops/ms
Float16OperationsBenchmark.sqrtBenchmark                              1024  thrpt    2   337.606          ops/ms
Float16OperationsBenchmark.subBenchmark                               1024  thrpt    2  1212.467          ops/ms

Withopt:
Benchmark                                                      (vectorDim)   Mode  Cnt      Score   Error   Units
Float16OperationsBenchmark.absBenchmark                               1024  thrpt    2  28481.336          ops/ms
Float16OperationsBenchmark.addBenchmark                               1024  thrpt    2  21311.633          ops/ms
Float16OperationsBenchmark.cosineSimilarityDequantizedFP16            1024  thrpt    2    489.324          ops/ms
Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16         1024  thrpt    2    592.947          ops/ms
Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16         1024  thrpt    2    616.415          ops/ms
Float16OperationsBenchmark.divBenchmark                               1024  thrpt    2   1991.958          ops/ms
Float16OperationsBenchmark.dotProductFP16                             1024  thrpt    2    586.924          ops/ms
Float16OperationsBenchmark.euclideanDistanceDequantizedFP16           1024  thrpt    2    747.626          ops/ms
Float16OperationsBenchmark.euclideanDistanceFP16                      1024  thrpt    2    635.823          ops/ms
Float16OperationsBenchmark.fmaBenchmark                               1024  thrpt    2  15722.304          ops/ms
Float16OperationsBenchmark.getExponentBenchmark                       1024  thrpt    2   2685.930          ops/ms
Float16OperationsBenchmark.isFiniteBenchmark                          1024  thrpt    2   3455.726          ops/ms
Float16OperationsBenchmark.isFiniteCMovBenchmark                      1024  thrpt    2   2026.590          ops/ms
Float16OperationsBenchmark.isFiniteStoreBenchmark                     1024  thrpt    2   2265.065          ops/ms
Float16OperationsBenchmark.isInfiniteBenchmark                        1024  thrpt    2   2140.280          ops/ms
Float16OperationsBenchmark.isInfiniteCMovBenchmark                    1024  thrpt    2   2026.135          ops/ms
Float16OperationsBenchmark.isInfiniteStoreBenchmark                   1024  thrpt    2   1340.694          ops/ms
Float16OperationsBenchmark.isNaNBenchmark                             1024  thrpt    2   2432.249          ops/ms
Float16OperationsBenchmark.isNaNCMovBenchmark                         1024  thrpt    2   1710.044          ops/ms
Float16OperationsBenchmark.isNaNStoreBenchmark                        1024  thrpt    2   2055.544          ops/ms
Float16OperationsBenchmark.maxBenchmark                               1024  thrpt    2  22170.178          ops/ms
Float16OperationsBenchmark.minBenchmark                               1024  thrpt    2  21735.692          ops/ms
Float16OperationsBenchmark.mulBenchmark                               1024  thrpt    2  22235.991          ops/ms
Float16OperationsBenchmark.negateBenchmark                            1024  thrpt    2  27733.529          ops/ms
Float16OperationsBenchmark.sqrtBenchmark                              1024  thrpt    2   1770.878          ops/ms
Float16OperationsBenchmark.subBenchmark                               1024  thrpt    2  21800.058          ops/ms

Java implementation of Float16.isNaN API is not auto-vectorizer friendly, and the existence of multiple conditional expressions prevents inferring conditional compare IR, while vectorization of Float16.isFinite and Float16.isInfinite APIs is possible on inferring VectorBlend for a contiguous pack of CMoveI IR in the presence of -XX:+UseVectorCmov and -XX:+UseCMoveUnconditionally runtime flags, we plan to optimize these APIs through scalar intrinsification and its auto-vectorization support in a subsequent patch.

Kindly review and share your feedback.

Best Regards,
Jatin

PS: PR #21414 replaced vector lane type (Type*) with the BasicType in the ideal type (TypeVect) associated with vector IR; thus, even though we have an explicit scalar type (TypeH) for Float16 IR, we do not retain this information in TypeVect. Purely from an ideal type perspective, ShortVector and Float16Vector are similar; we therefore rely on IR specialization to differentiate a short vector operation from Float16 operations.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8346236: Auto vectorization support for various Float16 operations (Sub-task - P4)

Reviewers

Emanuel Peter (@eme64 - Reviewer)
Sandhya Viswanathan (@sviswa7 - Reviewer) Review applies to 2c09e816

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755
$ git checkout pull/22755

Update a local copy of the PR:
$ git checkout pull/22755
$ git pull https://git.openjdk.org/jdk.git pull/22755/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22755

View PR using the GUI difftool:
$ git pr show -t 22755

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22755.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2024-12-15T20:07:16Z

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-12-15T20:07:25Z

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8346236: Auto vectorization support for various Float16 operations

Reviewed-by: epeter, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 124 new commits pushed to the master branch:

bcac42a: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable
45b7c74: 8341641: Make %APPDATA% and %LOCALAPPDATA% env variables available in *.cfg files
5c438c5: 8352748: Remove com.sun.tools.classfile from the JDK
... and 121 more: https://git.openjdk.org/jdk/compare/d894b781b8f245ce8a5d28401c0abb5abb420bc8...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2024-12-15T20:07:54Z

@jatin-bhateja The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

bridgekeeper · 2025-02-10T00:18:14Z

@jatin-bhateja This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

openjdk · 2025-02-12T22:02:28Z

@jatin-bhateja this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8346236
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

mlbridge · 2025-02-26T20:50:51Z

Webrevs

bridgekeeper · 2025-03-10T03:58:04Z

@jatin-bhateja This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

jatin-bhateja · 2025-03-10T05:40:54Z

/open

openjdk · 2025-03-10T05:41:29Z

@jatin-bhateja This pull request is now open

src/hotspot/cpu/x86/x86.ad

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java

sviswa7 · 2025-03-19T23:45:52Z

There is a test failure in GHA. A merge with master would be good.

openjdk · 2025-03-20T20:21:09Z

@jatin-bhateja Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java

jatin-bhateja · 2025-03-28T04:53:16Z

I looked at the changes in Generators.java, thanks for adding some code there 😊

Some comments on it:

You should add some Float16 tests to test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java.

I am missing the "mixed distribution" function float16s(). As a reference, take public Generator<Double> doubles(). The idea is that we have a set of distributions, and we pick a random distribution every time in the tests.

I'm also missing a "any bits" version, where you would take a random short value and reinterpret it as Float16. This ensures that we are getting all possible encodings, including multiple NaN encodings.

All of this is probably enough code to make a separate PR.

Hi @eme64
Your comments have been addressed.

test/hotspot/jtreg/compiler/lib/generators/Generators.java

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java

sviswa7

Looks good to me.

jatin-bhateja · 2025-04-09T08:54:06Z

Hi @eme64 , Please let us know if there are further comments.

eme64 · 2025-04-09T09:07:31Z

@jatin-bhateja I think it looks good now, but let me run some more tests :)

jatin-bhateja · 2025-04-10T07:03:44Z

@jatin-bhateja I think it looks good now, but let me run some more tests :)

Hi @eme64 , please let us know if this is good to land :-)

eme64

No related test failures :) 🟢

Approved, thanks for the work @jatin-bhateja 😊

jatin-bhateja · 2025-04-10T09:43:27Z

/integrate

openjdk · 2025-04-10T09:44:59Z

Going to push as commit 9a3f999.
Since your change was applied there have been 130 commits pushed to the master branch:

6545e0d: 8353189: [ASAN] memory leak after 8352184
f94a4f7: 8353847: Remove extra args to System.out.printf in open/test/jdk/java/net/httpclient tests
04e2a06: 8351660: C2: SIGFPE in unsigned_mod_value
... and 127 more: https://git.openjdk.org/jdk/compare/d894b781b8f245ce8a5d28401c0abb5abb420bc8...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-04-10T09:45:13Z

@jatin-bhateja Pushed as commit 9a3f999.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

jatin-bhateja · 2025-04-10T09:46:34Z

Thanks @sviswa7 and @eme64 for review and approvals :-)

Jatin Bhateja added 2 commits December 16, 2024 00:58

Auto Vectorization support for Float16 operations.

a87ae6e

Add MinVHF/MaxVHF to commutative op list

6570c06

openjdk bot added the hotspot-compiler [email protected] label Dec 15, 2024

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Feb 12, 2025

jatin-bhateja force-pushed the JDK-8346236 branch from 7d27077 to 33fc9b0 Compare February 13, 2025 06:29

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236

33fc9b0

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 13, 2025

Jatin Bhateja added 3 commits February 13, 2025 07:12

Updating copyright

4e10a21

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236

859c24c

Updating benchmark

f6178b7

jatin-bhateja force-pushed the JDK-8346236 branch from f1fc1b6 to f6178b7 Compare February 26, 2025 20:21

jatin-bhateja marked this pull request as ready for review February 26, 2025 20:45

openjdk bot added the rfr Pull request is ready for review label Feb 26, 2025

bridgekeeper bot closed this Mar 10, 2025

openjdk bot reopened this Mar 10, 2025

sviswa7 reviewed Mar 18, 2025

View reviewed changes

src/hotspot/cpu/x86/x86.ad Outdated Show resolved Hide resolved

src/hotspot/cpu/x86/x86.ad Outdated Show resolved Hide resolved

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java Show resolved Hide resolved

sviswa7 reviewed Mar 19, 2025

View reviewed changes

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java Show resolved Hide resolved

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java Outdated Show resolved Hide resolved

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236

2f0dac5

jatin-bhateja force-pushed the JDK-8346236 branch from 0da8cea to 784a4ec Compare March 20, 2025 20:20

eme64 reviewed Mar 25, 2025

View reviewed changes

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java Outdated Show resolved Hide resolved

Jatin Bhateja added 2 commits March 26, 2025 15:55

Adding tests for new float16 Generator

ce3abe7

Some re-factoring

6f89f3f

jatin-bhateja mentioned this pull request Mar 28, 2025

8352585: Add special case handling for Float16.max/min x86 backend #24169

Closed

3 tasks

eme64 reviewed Mar 28, 2025

View reviewed changes

test/hotspot/jtreg/compiler/lib/generators/Generators.java Show resolved Hide resolved

eme64 reviewed Mar 28, 2025

View reviewed changes

test/hotspot/jtreg/compiler/lib/generators/Generators.java Show resolved Hide resolved

eme64 suggested changes Mar 28, 2025

View reviewed changes

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java Outdated Show resolved Hide resolved

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java Outdated Show resolved Hide resolved

Jatin Bhateja added 3 commits March 28, 2025 23:29

Review comment resolutions

a25eb50

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236

6d05863

Adding missing feature check

2c09e81

sviswa7 approved these changes Apr 8, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Apr 8, 2025

Minor tuning in selection pattern

14bfe9b

openjdk bot removed the ready Pull request is ready to be integrated label Apr 9, 2025

eme64 approved these changes Apr 10, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Apr 10, 2025

openjdk bot added the integrated Pull request has been integrated label Apr 10, 2025

openjdk bot closed this Apr 10, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 10, 2025

This was referenced Aug 20, 2025

Merge vectorIntrinsics openjdk/panama-vector#229

Closed

Merge vectorIntrinsics openjdk/panama-vector#230

Closed

8346236: Auto vectorization support for various Float16 operations #22755

8346236: Auto vectorization support for various Float16 operations #22755

Uh oh!

Conversation

jatin-bhateja commented Dec 15, 2024 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Dec 15, 2024

Uh oh!

openjdk bot commented Dec 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Dec 15, 2024

Uh oh!

bridgekeeper bot commented Feb 10, 2025

Uh oh!

openjdk bot commented Feb 12, 2025

Uh oh!

mlbridge bot commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

bridgekeeper bot commented Mar 10, 2025

Uh oh!

jatin-bhateja commented Mar 10, 2025

Uh oh!

openjdk bot commented Mar 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sviswa7 commented Mar 19, 2025

Uh oh!

openjdk bot commented Mar 20, 2025

Uh oh!

Uh oh!

jatin-bhateja commented Mar 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sviswa7 left a comment

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja commented Apr 9, 2025

Uh oh!

eme64 commented Apr 9, 2025

Uh oh!

jatin-bhateja commented Apr 10, 2025

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja commented Apr 10, 2025

Uh oh!

openjdk bot commented Apr 10, 2025

Uh oh!

openjdk bot commented Apr 10, 2025

Uh oh!

jatin-bhateja commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

jatin-bhateja commented Dec 15, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Dec 15, 2024 •

edited

Loading

mlbridge bot commented Feb 26, 2025 •

edited

Loading