Skip to content

Conversation

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Dec 15, 2024

Hi All,

This patch adds C2 compiler support for various Float16 operations added by PR#22128

Following is the summary of changes included with this patch:-

  1. Detection of various Float16 operations through inline expansion or pattern folding idealizations.
  2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization.
  3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class.
    • These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values.
  4. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines.
  5. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to FAQs for more details.
  6. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa.
  7. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF
  8. X86 backend implementation for all supported intrinsics.
  9. Functional and Performance validation tests.

Kindly review the patch and share your feedback.

Best Regards,
Jatin


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8342103: C2 compiler support for Float16 type and associated scalar operations (Enhancement - P4)

Reviewers

Contributors

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754
$ git checkout pull/22754

Update a local copy of the PR:
$ git checkout pull/22754
$ git pull https://git.openjdk.org/jdk.git pull/22754/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22754

View PR using the GUI difftool:
$ git pr show -t 22754

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22754.diff

Using Webrev

Link to Webrev Comment

@jatin-bhateja
Copy link
Member Author

Some FAQs on the newly added ideal type for half-float IR nodes:-

Q. Why do we not use existing TypeInt::SHORT instead of creating a new TypeH type?
A. Newly defined half float type named TypeH is special as its basic type is T_SHORT while its ideal type is RegF. Thus, the C2 type system views its associated IR node as a 16-bit short value while the register allocator assigns it a floating point register.

Q. Problem with ConF?
A. During Auto-Vectorization, ConF replication constrains the operational vector lane count to half of what can otherwise be used for regular Float16 operation i.e. only 16 floats can be accommodated into a 512-bit vector thereby limiting the lane count of vectors in its use-def chain, one possible way to address it is through a kludge in auto-vectorizer to cast them to a 16 bits constant by analyzing its context. Newly defined Float16 constant nodes 'ConH' are inherently 16-bit encoded IEEE 754 FP16 values and can be efficiently packed to leverage full target vector width.

All Float16 IR nodes now carry newly defined Type::HALF_FLOAT type instead of Type::FLOAT, thus we no longer need special handling in auto-vectorizer to prune their container type to short.

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 15, 2024

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@jatin-bhateja
Copy link
Member Author

/contributor add @PaulSandoz

@jatin-bhateja
Copy link
Member Author

/contributor add @Bhavana-Kilambi

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8342103: C2 compiler support for Float16 type and associated scalar operations

Co-authored-by: Paul Sandoz <[email protected]>
Co-authored-by: Bhavana Kilambi <[email protected]>
Co-authored-by: Joe Darcy <[email protected]>
Co-authored-by: Raffaello Giulietti <[email protected]>
Reviewed-by: psandoz, epeter, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 218 new commits pushed to the master branch:

  • ed17c55: 8349145: Make Class.getProtectionDomain() non-native
  • e700460: 8349813: Test behavior of limiting() on RS operators throwing exceptions
  • 08f4c1c: 8349781: make test TEST=gtest fails on WSL
  • bb41df4: 8349723: Problemlist jdp tests for macosx-x64
  • adda12b: 8349874: Missing comma in copyright from JDK-8349689
  • 342dec9: 8347019: Test javax/swing/JRadioButton/8033699/bug8033699.java still fails: Focus is not on Radio Button Single as Expected
  • 88b4a90: 8349689: Several virtual thread tests missing /native keyword
  • d558d9d: 8349702: jdk.internal.net.http.Http2Connection::putStream needs to provide cause while cancelling stream
  • 8c09d40: 8348268: Test gc/shenandoah/TestResizeTLAB.java#compact: fatal error: Before Updating References: Thread C2 CompilerThread1: expected gc-state 9, actual 21
  • e7157d1: 8150442: Enforce Supported Platforms in Packager for MSI bundles
  • ... and 208 more: https://git.openjdk.org/jdk/compare/4a375e5b8899aa684b8a921e198203e76794f709...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
Contributor Paul Sandoz <[email protected]> successfully added.

@jatin-bhateja
Copy link
Member Author

/contributor add @jddarcy

@jatin-bhateja
Copy link
Member Author

/contributor add @rgiulietti

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
Contributor Bhavana Kilambi <[email protected]> successfully added.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
Contributor Joe Darcy <[email protected]> successfully added.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
Contributor Raffaello Giulietti <[email protected]> successfully added.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja The following labels will be automatically applied to this pull request:

  • core-libs
  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@jatin-bhateja
Copy link
Member Author

/label add hotspot-compiler-dev

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja
The hotspot-compiler label was successfully added.

@jatin-bhateja jatin-bhateja marked this pull request as ready for review December 15, 2024 18:14
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 15, 2024
@mlbridge
Copy link

mlbridge bot commented Dec 15, 2024

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you quickly summarize what tests you have, and what they test?

Comment on lines 44 to 49
@IR(applyIfCPUFeatureOr = {"f16c", "true", "avx512vl", "true", "zvfh", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
@IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"},
counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
@IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"},
counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
@IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "zvfh", "true"},
counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is having vector changes?
And this is pre-existing: but why are we using VECTOR_SIZE_ANY here? Can we not know the vector size? Maybe we can introduce a new tag max_float16 or max_hf. And do something like this:
IRNode.VECTOR_SIZE + "min(max_float, max_hf)", "> 0"

The downside with using ANY is that the exact size is not tested, and that might mean that the size is much smaller than ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eme64 , Test modification looks ok to me, we intend to trigger these IR rules on non AVX512-FP16 targets.
On AVX512-FP16 target compiler will infer scalar float16 add operation which will not get auto-vectorized.

@jatin-bhateja
Copy link
Member Author

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

@eme64
Copy link
Contributor

eme64 commented Dec 16, 2024

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours.

@jatin-bhateja
Copy link
Member Author

Can you quickly summarize what tests you have, and what they test?

Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps.

I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours.

Validations details:-

A) x86 backend changes
   - new assembler instruction
   - macro assembly routines. 
    Test point:-  test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java
         - This test is based on a testng framework and includes new DataProviders to generate test vectors.
         -  Test vectors cover the entire float16 value range and also special floating point values (NaN, +Int, -Inf, 0.0 and -0.0) 
B) GVN transformations:-
     -  Value Transforms
        Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
              -  Covers all the constant folding scenarios for add, sub, mul, div, sqrt, fma, min, and max operations addressed by this patch.
              -  It also tests special case scenarios for each operation as specified by Java language specification.
    -   identity Transforms
        Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
               -  Covers identity transformation for  ReinterpretS2HFNode,  DivHFNode
    -  idealization Transforms
        Test points:-  test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java
                                :-   test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java
            - Contains test point for the following transform 
                         MulHF idealization i.e. MulHF * 2 => AddHF  
           -  Contains test point for the following transform
                         DivHF SRC ,  PoT(constant) =>  MulHF SRC * reciprocal (constant) 
            - Contains idealization test points for the following transform 
                   ConvF2HF(FP32BinOp(ConvHF2F(x), ConvHF2F(y))) =>
                           ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y)))

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, I found a few more details. But the C++ VM changes look really good now.

The Java changes I leave to @PaulSandoz

@jatin-bhateja
Copy link
Member Author

@jatin-bhateja Testing is all green 🟢 Doing a last pass over the code.

Thanks @eme64, looking forward to your approval :-)

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jatin-bhateja for all your patience, this really took a while 🙈

It looks good to me - again I'm only reviewing the C++ VM changes, so someone else has to review the Java changes.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 4, 2025
@jatin-bhateja
Copy link
Member Author

Hi @PaulSandoz , Kindly let us know if this is good for integration.

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An impressive and substantial change. I focused on the Java code, there are some small tweaks, presented in comments, we can make to the intrinsics to improve the expression of code, and it has no impact on the intrinsic implementation.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Feb 11, 2025
@jatin-bhateja
Copy link
Member Author

Hi @PaulSandoz , Your comments have been addressed.

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I merged this PR with master, successfully (at the time) with no conflicts, and ran it through tier 1 to 3 testing and there were no failures.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 12, 2025
@jatin-bhateja
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Feb 12, 2025

Going to push as commit 4b463ee.
Since your change was applied there have been 220 commits pushed to the master branch:

  • 332d87c: 8349859: Support static JDK in libfontmanager/freetypeScaler.c
  • 73e1780: 8349836: G1: Improve group prediction log message
  • ed17c55: 8349145: Make Class.getProtectionDomain() non-native
  • e700460: 8349813: Test behavior of limiting() on RS operators throwing exceptions
  • 08f4c1c: 8349781: make test TEST=gtest fails on WSL
  • bb41df4: 8349723: Problemlist jdp tests for macosx-x64
  • adda12b: 8349874: Missing comma in copyright from JDK-8349689
  • 342dec9: 8347019: Test javax/swing/JRadioButton/8033699/bug8033699.java still fails: Focus is not on Radio Button Single as Expected
  • 88b4a90: 8349689: Several virtual thread tests missing /native keyword
  • d558d9d: 8349702: jdk.internal.net.http.Http2Connection::putStream needs to provide cause while cancelling stream
  • ... and 210 more: https://git.openjdk.org/jdk/compare/4a375e5b8899aa684b8a921e198203e76794f709...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 12, 2025
@openjdk openjdk bot closed this Feb 12, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 12, 2025
@openjdk
Copy link

openjdk bot commented Feb 12, 2025

@jatin-bhateja Pushed as commit 4b463ee.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Feb 12, 2025

Thanks @PaulSandoz , @eme64 and @sviswa7 for your valuable feedback.

@TheShermanTanker
Copy link
Contributor

Is anyone else getting compile failures after this was integrated? This weirdly seems to only happen on Linux

* For target hotspot_variant-server_libjvm_objs_mulnode.o:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp: In member function ‘virtual const Type* FmaHFNode::Value(PhaseGVN*) const’:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:1944:37: error: call of overloaded ‘make(double)’ is ambiguous
 1944 |   return TypeH::make(fma(f1, f2, f3));
      |                                     ^
In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.hpp:31,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/addnode.hpp:28,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:26:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:544:23: note: candidate: ‘static const TypeH* TypeH::make(float)’
  544 |   static const TypeH* make(float f);
      |                       ^~~~
/home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:545:23: note: candidate: ‘static const TypeH* TypeH::make(short int)’
  545 |   static const TypeH* make(short f);
      |                       ^~~~

@sviswa7
Copy link

sviswa7 commented Feb 19, 2025

@TheShermanTanker I don't see any compile failures on Linux. Both the fastdebug and release build successfully.

@jatin-bhateja
Copy link
Member Author

jatin-bhateja commented Feb 20, 2025

Is anyone else getting compile failures after this was integrated? This weirdly seems to only happen on Linux

* For target hotspot_variant-server_libjvm_objs_mulnode.o:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp: In member function ‘virtual const Type* FmaHFNode::Value(PhaseGVN*) const’:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:1944:37: error: call of overloaded ‘make(double)’ is ambiguous
 1944 |   return TypeH::make(fma(f1, f2, f3));
      |                                     ^
In file included from /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.hpp:31,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/addnode.hpp:28,
                 from /home/runner/work/jdk/jdk/src/hotspot/share/opto/mulnode.cpp:26:
/home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:544:23: note: candidate: ‘static const TypeH* TypeH::make(float)’
  544 |   static const TypeH* make(float f);
      |                       ^~~~
/home/runner/work/jdk/jdk/src/hotspot/share/opto/type.hpp:545:23: note: candidate: ‘static const TypeH* TypeH::make(short int)’
  545 |   static const TypeH* make(short f);
      |                       ^~~~

Hi @TheShermanTanker ,

Please file a separate JBS issue for the errors you are observing with non-standard build options.
I am also seeing some other build issues with the following configuration
--with-extra-cxxflags=-D__CORRECT_ISO_CPP11_MATH_H_PROTO_FP

Best Regards,
Jatin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

9 participants