8342103: C2 compiler support for Float16 type and associated scalar operations #22754

jatin-bhateja · 2024-12-15T18:05:02Z

Hi All,

This patch adds C2 compiler support for various Float16 operations added by PR#22128

Following is the summary of changes included with this patch:-

Detection of various Float16 operations through inline expansion or pattern folding idealizations.
Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
- These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
New Ideal type for constant and non-constant Float16 IR nodes. Please refer to FAQs for more details.
Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
X86 backend implementation for all supported intrinsics.
Functional and Performance validation tests.

Kindly review the patch and share your feedback.

Best Regards,
Jatin

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8342103: C2 compiler support for Float16 type and associated scalar operations (Enhancement - P4)

Reviewers

Emanuel Peter (@eme64 - Reviewer) Review applies to 82a42213
Sandhya Viswanathan (@sviswa7 - Reviewer) Review applies to 19fc6c2d
Paul Sandoz (@PaulSandoz - Reviewer)

Contributors

Paul Sandoz <[email protected]>
Bhavana Kilambi <[email protected]>
Joe Darcy <[email protected]>
Raffaello Giulietti <[email protected]>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754
$ git checkout pull/22754

Update a local copy of the PR:
$ git checkout pull/22754
$ git pull https://git.openjdk.org/jdk.git pull/22754/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22754

View PR using the GUI difftool:
$ git pr show -t 22754

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22754.diff

Using Webrev

Link to Webrev Comment

jatin-bhateja · 2024-12-15T18:05:22Z

Some FAQs on the newly added ideal type for half-float IR nodes:-

Q. Why do we not use existing TypeInt::SHORT instead of creating a new TypeH type?
A. Newly defined half float type named TypeH is special as its basic type is T_SHORT while its ideal type is RegF. Thus, the C2 type system views its associated IR node as a 16-bit short value while the register allocator assigns it a floating point register.

Q. Problem with ConF?
A. During Auto-Vectorization, ConF replication constrains the operational vector lane count to half of what can otherwise be used for regular Float16 operation i.e. only 16 floats can be accommodated into a 512-bit vector thereby limiting the lane count of vectors in its use-def chain, one possible way to address it is through a kludge in auto-vectorizer to cast them to a 16 bits constant by analyzing its context. Newly defined Float16 constant nodes 'ConH' are inherently 16-bit encoded IEEE 754 FP16 values and can be efficiently packed to leverage full target vector width.

All Float16 IR nodes now carry newly defined Type::HALF_FLOAT type instead of Type::FLOAT, thus we no longer need special handling in auto-vectorizer to prune their container type to short.

bridgekeeper · 2024-12-15T18:05:58Z

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

jatin-bhateja · 2024-12-15T18:06:11Z

/contributor add @PaulSandoz

jatin-bhateja · 2024-12-15T18:06:25Z

/contributor add @Bhavana-Kilambi

openjdk · 2024-12-15T18:06:39Z

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8342103: C2 compiler support for Float16 type and associated scalar operations

Co-authored-by: Paul Sandoz <[email protected]>
Co-authored-by: Bhavana Kilambi <[email protected]>
Co-authored-by: Joe Darcy <[email protected]>
Co-authored-by: Raffaello Giulietti <[email protected]>
Reviewed-by: psandoz, epeter, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 218 new commits pushed to the master branch:

ed17c55: 8349145: Make Class.getProtectionDomain() non-native
e700460: 8349813: Test behavior of limiting() on RS operators throwing exceptions
08f4c1c: 8349781: make test TEST=gtest fails on WSL
bb41df4: 8349723: Problemlist jdp tests for macosx-x64
adda12b: 8349874: Missing comma in copyright from JDK-8349689
342dec9: 8347019: Test javax/swing/JRadioButton/8033699/bug8033699.java still fails: Focus is not on Radio Button Single as Expected
88b4a90: 8349689: Several virtual thread tests missing /native keyword
d558d9d: 8349702: jdk.internal.net.http.Http2Connection::putStream needs to provide cause while cancelling stream
8c09d40: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21
e7157d1: 8150442: Enforce Supported Platforms in Packager for MSI bundles
... and 208 more: https://git.openjdk.org/jdk/compare/4a375e5b8899aa684b8a921e198203e76794f709...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2024-12-15T18:07:08Z

@jatin-bhateja
Contributor Paul Sandoz <[email protected]> successfully added.

jatin-bhateja · 2024-12-15T18:07:08Z

/contributor add @jddarcy

jatin-bhateja · 2024-12-15T18:07:29Z

/contributor add @rgiulietti

openjdk · 2024-12-15T18:07:32Z

@jatin-bhateja
Contributor Bhavana Kilambi <[email protected]> successfully added.

openjdk · 2024-12-15T18:08:02Z

@jatin-bhateja
Contributor Joe Darcy <[email protected]> successfully added.

openjdk · 2024-12-15T18:08:28Z

@jatin-bhateja
Contributor Raffaello Giulietti <[email protected]> successfully added.

openjdk · 2024-12-15T18:08:56Z

@jatin-bhateja The following labels will be automatically applied to this pull request:

core-libs
graal
hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

jatin-bhateja · 2024-12-15T18:09:18Z

/label add hotspot-compiler-dev

openjdk · 2024-12-15T18:09:26Z

@jatin-bhateja
The hotspot-compiler label was successfully added.

mlbridge · 2024-12-15T18:19:32Z

Webrevs

eme64

Can you quickly summarize what tests you have, and what they test?

eme64 · 2024-12-16T07:19:43Z

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java

-    @IR(applyIfCPUFeatureOr = {"f16c", "true", "avx512vl", "true", "zvfh", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
+    @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"},
+        counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
+    @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"},
+        counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
+    @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "zvfh", "true"},
+        counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})


Looks like this is having vector changes?
And this is pre-existing: but why are we using VECTOR_SIZE_ANY here? Can we not know the vector size? Maybe we can introduce a new tag max_float16 or max_hf. And do something like this:
IRNode.VECTOR_SIZE + "min(max_float, max_hf)", "> 0"

The downside with using ANY is that the exact size is not tested, and that might mean that the size is much smaller than ideal.

Hi @eme64 , Test modification looks ok to me, we intend to trigger these IR rules on non AVX512-FP16 targets.
On AVX512-FP16 target compiler will infer scalar float16 add operation which will not get auto-vectorized.

jatin-bhateja · 2024-12-16T08:32:32Z

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

eme64 · 2024-12-16T09:03:38Z

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours.

jatin-bhateja · 2024-12-16T14:19:49Z

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours.

Validations details:-

A) x86 backend changes
   - new assembler instruction
   - macro assembly routines. 
    Test point:-  test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java
         - This test is based on a testng framework and includes new DataProviders to generate test vectors.
         -  Test vectors cover the entire float16 value range and also special floating point values (NaN, +Int, -Inf, 0.0 and -0.0) 
B) GVN transformations:-
     -  Value Transforms
        Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
              -  Covers all the constant folding scenarios for add, sub, mul, div, sqrt, fma, min, and max operations addressed by this patch.
              -  It also tests special case scenarios for each operation as specified by Java language specification.
    -   identity Transforms
        Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
               -  Covers identity transformation for  ReinterpretS2HFNode,  DivHFNode
    -  idealization Transforms
        Test points:-  test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java
                                :-   test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
            - Contains test point for the following transform 
                         MulHF idealization i.e. MulHF * 2 => AddHF  
           -  Contains test point for the following transform
                         DivHF SRC ,  PoT(constant) =>  MulHF SRC * reciprocal (constant) 
            - Contains idealization test points for the following transform 
                   ConvF2HF(FP32BinOp(ConvHF2F(x), ConvHF2F(y))) =>
                           ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y)))

src/hotspot/share/opto/convertnode.hpp

eme64

Ooops, I found a few more details. But the C++ VM changes look really good now.

The Java changes I leave to @PaulSandoz

src/hotspot/share/opto/convertnode.hpp

src/hotspot/share/opto/convertnode.cpp

src/hotspot/share/opto/divnode.cpp

src/hotspot/share/opto/type.cpp

jatin-bhateja · 2025-02-04T10:00:06Z

@jatin-bhateja Testing is all green 🟢 Doing a last pass over the code.

Thanks @eme64, looking forward to your approval :-)

eme64

Thanks @jatin-bhateja for all your patience, this really took a while 🙈

It looks good to me - again I'm only reviewing the C++ VM changes, so someone else has to review the Java changes.

src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java

jatin-bhateja · 2025-02-10T05:31:08Z

Hi @PaulSandoz , Kindly let us know if this is good for integration.

PaulSandoz

An impressive and substantial change. I focused on the Java code, there are some small tweaks, presented in comments, we can make to the intrinsics to improve the expression of code, and it has no impact on the intrinsic implementation.

src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java

test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java

src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java

jatin-bhateja · 2025-02-12T09:10:48Z

Hi @PaulSandoz , Your comments have been addressed.

PaulSandoz

Looks good. I merged this PR with master, successfully (at the time) with no conflicts, and ran it through tier 1 to 3 testing and there were no failures.

jatin-bhateja · 2025-02-12T17:02:06Z

/integrate

openjdk · 2025-02-12T17:02:52Z

Going to push as commit 4b463ee.
Since your change was applied there have been 220 commits pushed to the master branch:

332d87c: 8349859: Support static JDK in libfontmanager/freetypeScaler.c
73e1780: 8349836: G1: Improve group prediction log message
ed17c55: 8349145: Make Class.getProtectionDomain() non-native
e700460: 8349813: Test behavior of limiting() on RS operators throwing exceptions
08f4c1c: 8349781: make test TEST=gtest fails on WSL
bb41df4: 8349723: Problemlist jdp tests for macosx-x64
adda12b: 8349874: Missing comma in copyright from JDK-8349689
342dec9: 8347019: Test javax/swing/JRadioButton/8033699/bug8033699.java still fails: Focus is not on Radio Button Single as Expected
88b4a90: 8349689: Several virtual thread tests missing /native keyword
d558d9d: 8349702: jdk.internal.net.http.Http2Connection::putStream needs to provide cause while cancelling stream
... and 210 more: https://git.openjdk.org/jdk/compare/4a375e5b8899aa684b8a921e198203e76794f709...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-02-12T17:03:05Z

@jatin-bhateja Pushed as commit 4b463ee.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

jatin-bhateja · 2025-02-12T17:03:09Z

Thanks @PaulSandoz , @eme64 and @sviswa7 for your valuable feedback.

TheShermanTanker · 2025-02-18T02:36:13Z

Is anyone else getting compile failures after this was integrated? This weirdly seems to only happen on Linux

* For target hotspot_variant-server_libjvm_objs_mulnode.o:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp: In member function ‘virtual const Type* FmaHFNode::Value(PhaseGVN*) const’:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:1944:37: error: call of overloaded ‘make(double)’ is ambiguous
 1944 |   return TypeH::make(fma(f1, f2, f3));
      |                                     ^
In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.hpp:31,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/addnode.hpp:28,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:26:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:544:23: note: candidate: ‘static const TypeH* TypeH::make(float)’
  544 |   static const TypeH* make(float f);
      |                       ^~~~
/home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:545:23: note: candidate: ‘static const TypeH* TypeH::make(short int)’
  545 |   static const TypeH* make(short f);
      |                       ^~~~

sviswa7 · 2025-02-19T23:18:19Z

@TheShermanTanker I don't see any compile failures on Linux. Both the fastdebug and release build successfully.

jatin-bhateja · 2025-02-20T11:32:27Z

Is anyone else getting compile failures after this was integrated? This weirdly seems to only happen on Linux

* For target hotspot_variant-server_libjvm_objs_mulnode.o:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp: In member function ‘virtual const Type* FmaHFNode::Value(PhaseGVN*) const’:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:1944:37: error: call of overloaded ‘make(double)’ is ambiguous
 1944 |   return TypeH::make(fma(f1, f2, f3));
      |                                     ^
In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.hpp:31,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/addnode.hpp:28,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:26:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:544:23: note: candidate: ‘static const TypeH* TypeH::make(float)’
  544 |   static const TypeH* make(float f);
      |                       ^~~~
/home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:545:23: note: candidate: ‘static const TypeH* TypeH::make(short int)’
  545 |   static const TypeH* make(short f);
      |                       ^~~~

Hi @TheShermanTanker ,

Please file a separate JBS issue for the errors you are observing with non-standard build options.
I am also seeing some other build issues with the following configuration
--with-extra-cxxflags=-D__CORRECT_ISO_CPP11_MATH_H_PROTO_FP

Best Regards,
Jatin

C2 compiler support for float16 scalar operations.

c215eac

openjdk bot added graal [email protected] hotspot [email protected] core-libs [email protected] labels Dec 15, 2024

openjdk bot added the hotspot-compiler [email protected] label Dec 15, 2024

jatin-bhateja marked this pull request as ready for review December 15, 2024 18:14

openjdk bot added the rfr Pull request is ready for review label Dec 15, 2024

jatin-bhateja mentioned this pull request Dec 15, 2024

8346236: Auto vectorization support for various Float16 operations #22755

Closed

3 tasks

eme64 reviewed Dec 16, 2024

View reviewed changes

Adding missed check in container type detection.

7cb694f

Adding more test points

3a6697e

mur47x111 mentioned this pull request Dec 16, 2024

[JDK-8344599] Adapt JDK-8342103: C2 compiler support for Float16 type and associated operations oracle/graal#10117

Merged

eme64 reviewed Feb 4, 2025

View reviewed changes

src/hotspot/share/opto/convertnode.hpp Outdated Show resolved Hide resolved

eme64 reviewed Feb 4, 2025

View reviewed changes

Fixing typos

82a4221

eme64 approved these changes Feb 4, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Feb 4, 2025

liach reviewed Feb 4, 2025

View reviewed changes

src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java Outdated Show resolved Hide resolved

PaulSandoz reviewed Feb 10, 2025

View reviewed changes

Review comments resolutions

111c808

openjdk bot removed the ready Pull request is ready to be integrated label Feb 11, 2025

PaulSandoz approved these changes Feb 12, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Feb 12, 2025

openjdk bot added the integrated Pull request has been integrated label Feb 12, 2025

openjdk bot closed this Feb 12, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 12, 2025

graalvmbot mentioned this pull request Feb 14, 2025

[GR-62167] Update labsjdk to 25+10-jvmci-b01 oracle/graal#10691

Merged

jatin-bhateja deleted the JDK-8342103 branch February 20, 2025 11:33

Hamlin-Li mentioned this pull request Feb 28, 2025

8345298: RISC-V: Add riscv backend for Float16 operations - scalar #23844

Closed

3 tasks

This was referenced Aug 20, 2025

Merge vectorIntrinsics openjdk/panama-vector#229

Closed

Merge vectorIntrinsics openjdk/panama-vector#230

Closed

8342103: C2 compiler support for Float16 type and associated scalar operations #22754

8342103: C2 compiler support for Float16 type and associated scalar operations #22754

Uh oh!

Conversation

jatin-bhateja commented Dec 15, 2024 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Contributors

Reviewing

Uh oh!

jatin-bhateja commented Dec 15, 2024

Uh oh!

bridgekeeper bot commented Dec 15, 2024

Uh oh!

jatin-bhateja commented Dec 15, 2024

Uh oh!

jatin-bhateja commented Dec 15, 2024

Uh oh!

openjdk bot commented Dec 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Dec 15, 2024

Uh oh!

jatin-bhateja commented Dec 15, 2024

Uh oh!

jatin-bhateja commented Dec 15, 2024

Uh oh!

openjdk bot commented Dec 15, 2024

Uh oh!

openjdk bot commented Dec 15, 2024

Uh oh!

openjdk bot commented Dec 15, 2024

Uh oh!

openjdk bot commented Dec 15, 2024

Uh oh!

jatin-bhateja commented Dec 15, 2024

Uh oh!

openjdk bot commented Dec 15, 2024

Uh oh!

mlbridge bot commented Dec 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

eme64 Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja commented Dec 16, 2024

Uh oh!

eme64 commented Dec 16, 2024

Uh oh!

jatin-bhateja commented Dec 16, 2024

Uh oh!

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jatin-bhateja commented Feb 4, 2025

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jatin-bhateja commented Feb 10, 2025

Uh oh!

PaulSandoz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

jatin-bhateja commented Dec 15, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Dec 15, 2024 •

edited

Loading

mlbridge bot commented Dec 15, 2024 •

edited

Loading

PaulSandoz left a comment •

edited

Loading

jatin-bhateja commented Feb 12, 2025 •

edited

Loading

jatin-bhateja commented Feb 20, 2025 •

edited

Loading