Skip to content

Conversation

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Dec 15, 2024

This is a follow-up PR for #22754

The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma).

Summary of changes included with the patch:

  1. C2 compiler New Vector IR creation.
  2. Auto-vectorization support.
  3. x86 backend implementation.
  4. New IR verification test for each newly supported vector operation.

Following are the performance numbers of Float16OperationsBenchmark

System : Intel(R) Xeon(R) Processor code-named Granite rapids
Frequency fixed at 2.5 GHz

Baseline
Benchmark                                                      (vectorDim)   Mode  Cnt     Score   Error   Units
Float16OperationsBenchmark.absBenchmark                               1024  thrpt    2  4191.787          ops/ms
Float16OperationsBenchmark.addBenchmark                               1024  thrpt    2  1211.978          ops/ms
Float16OperationsBenchmark.cosineSimilarityDequantizedFP16            1024  thrpt    2   493.026          ops/ms
Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16         1024  thrpt    2   612.430          ops/ms
Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16         1024  thrpt    2   616.012          ops/ms
Float16OperationsBenchmark.divBenchmark                               1024  thrpt    2   604.882          ops/ms
Float16OperationsBenchmark.dotProductFP16                             1024  thrpt    2   410.798          ops/ms
Float16OperationsBenchmark.euclideanDistanceDequantizedFP16           1024  thrpt    2   602.863          ops/ms
Float16OperationsBenchmark.euclideanDistanceFP16                      1024  thrpt    2   640.348          ops/ms
Float16OperationsBenchmark.fmaBenchmark                               1024  thrpt    2   809.175          ops/ms
Float16OperationsBenchmark.getExponentBenchmark                       1024  thrpt    2  2682.764          ops/ms
Float16OperationsBenchmark.isFiniteBenchmark                          1024  thrpt    2  3373.901          ops/ms
Float16OperationsBenchmark.isFiniteCMovBenchmark                      1024  thrpt    2  1881.652          ops/ms
Float16OperationsBenchmark.isFiniteStoreBenchmark                     1024  thrpt    2  2273.745          ops/ms
Float16OperationsBenchmark.isInfiniteBenchmark                        1024  thrpt    2  2147.913          ops/ms
Float16OperationsBenchmark.isInfiniteCMovBenchmark                    1024  thrpt    2  1962.579          ops/ms
Float16OperationsBenchmark.isInfiniteStoreBenchmark                   1024  thrpt    2  1696.494          ops/ms
Float16OperationsBenchmark.isNaNBenchmark                             1024  thrpt    2  2417.396          ops/ms
Float16OperationsBenchmark.isNaNCMovBenchmark                         1024  thrpt    2  1708.585          ops/ms
Float16OperationsBenchmark.isNaNStoreBenchmark                        1024  thrpt    2  2055.511          ops/ms
Float16OperationsBenchmark.maxBenchmark                               1024  thrpt    2  1211.940          ops/ms
Float16OperationsBenchmark.minBenchmark                               1024  thrpt    2  1212.063          ops/ms
Float16OperationsBenchmark.mulBenchmark                               1024  thrpt    2  1211.955          ops/ms
Float16OperationsBenchmark.negateBenchmark                            1024  thrpt    2  4215.922          ops/ms
Float16OperationsBenchmark.sqrtBenchmark                              1024  thrpt    2   337.606          ops/ms
Float16OperationsBenchmark.subBenchmark                               1024  thrpt    2  1212.467          ops/ms

Withopt:
Benchmark                                                      (vectorDim)   Mode  Cnt      Score   Error   Units
Float16OperationsBenchmark.absBenchmark                               1024  thrpt    2  28481.336          ops/ms
Float16OperationsBenchmark.addBenchmark                               1024  thrpt    2  21311.633          ops/ms
Float16OperationsBenchmark.cosineSimilarityDequantizedFP16            1024  thrpt    2    489.324          ops/ms
Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16         1024  thrpt    2    592.947          ops/ms
Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16         1024  thrpt    2    616.415          ops/ms
Float16OperationsBenchmark.divBenchmark                               1024  thrpt    2   1991.958          ops/ms
Float16OperationsBenchmark.dotProductFP16                             1024  thrpt    2    586.924          ops/ms
Float16OperationsBenchmark.euclideanDistanceDequantizedFP16           1024  thrpt    2    747.626          ops/ms
Float16OperationsBenchmark.euclideanDistanceFP16                      1024  thrpt    2    635.823          ops/ms
Float16OperationsBenchmark.fmaBenchmark                               1024  thrpt    2  15722.304          ops/ms
Float16OperationsBenchmark.getExponentBenchmark                       1024  thrpt    2   2685.930          ops/ms
Float16OperationsBenchmark.isFiniteBenchmark                          1024  thrpt    2   3455.726          ops/ms
Float16OperationsBenchmark.isFiniteCMovBenchmark                      1024  thrpt    2   2026.590          ops/ms
Float16OperationsBenchmark.isFiniteStoreBenchmark                     1024  thrpt    2   2265.065          ops/ms
Float16OperationsBenchmark.isInfiniteBenchmark                        1024  thrpt    2   2140.280          ops/ms
Float16OperationsBenchmark.isInfiniteCMovBenchmark                    1024  thrpt    2   2026.135          ops/ms
Float16OperationsBenchmark.isInfiniteStoreBenchmark                   1024  thrpt    2   1340.694          ops/ms
Float16OperationsBenchmark.isNaNBenchmark                             1024  thrpt    2   2432.249          ops/ms
Float16OperationsBenchmark.isNaNCMovBenchmark                         1024  thrpt    2   1710.044          ops/ms
Float16OperationsBenchmark.isNaNStoreBenchmark                        1024  thrpt    2   2055.544          ops/ms
Float16OperationsBenchmark.maxBenchmark                               1024  thrpt    2  22170.178          ops/ms
Float16OperationsBenchmark.minBenchmark                               1024  thrpt    2  21735.692          ops/ms
Float16OperationsBenchmark.mulBenchmark                               1024  thrpt    2  22235.991          ops/ms
Float16OperationsBenchmark.negateBenchmark                            1024  thrpt    2  27733.529          ops/ms
Float16OperationsBenchmark.sqrtBenchmark                              1024  thrpt    2   1770.878          ops/ms
Float16OperationsBenchmark.subBenchmark                               1024  thrpt    2  21800.058          ops/ms

Java implementation of Float16.isNaN API is not auto-vectorizer friendly, and the existence of multiple conditional expressions prevents inferring conditional compare IR, while vectorization of Float16.isFinite and Float16.isInfinite APIs is possible on inferring VectorBlend for a contiguous pack of CMoveI IR in the presence of -XX:+UseVectorCmov and -XX:+UseCMoveUnconditionally runtime flags, we plan to optimize these APIs through scalar intrinsification and its auto-vectorization support in a subsequent patch.

Kindly review and share your feedback.

Best Regards,
Jatin

PS: PR #21414 replaced vector lane type (Type*) with the BasicType in the ideal type (TypeVect) associated with vector IR; thus, even though we have an explicit scalar type (TypeH) for Float16 IR, we do not retain this information in TypeVect. Purely from an ideal type perspective, ShortVector and Float16Vector are similar; we therefore rely on IR specialization to differentiate a short vector operation from Float16 operations.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8346236: Auto vectorization support for various Float16 operations (Sub-task - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755
$ git checkout pull/22755

Update a local copy of the PR:
$ git checkout pull/22755
$ git pull https://git.openjdk.org/jdk.git pull/22755/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22755

View PR using the GUI difftool:
$ git pr show -t 22755

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22755.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 15, 2024

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8346236: Auto vectorization support for various Float16 operations

Reviewed-by: epeter, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 124 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Dec 15, 2024

@jatin-bhateja The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 10, 2025

@jatin-bhateja This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@openjdk
Copy link

openjdk bot commented Feb 12, 2025

@jatin-bhateja this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8346236
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Feb 12, 2025
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 13, 2025
@jatin-bhateja jatin-bhateja marked this pull request as ready for review February 26, 2025 20:45
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 26, 2025
@mlbridge
Copy link

mlbridge bot commented Feb 26, 2025

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 10, 2025

@jatin-bhateja This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Mar 10, 2025
@jatin-bhateja
Copy link
Member Author

/open

@openjdk openjdk bot reopened this Mar 10, 2025
@openjdk
Copy link

openjdk bot commented Mar 10, 2025

@jatin-bhateja This pull request is now open

@sviswa7
Copy link

sviswa7 commented Mar 19, 2025

There is a test failure in GHA. A merge with master would be good.

@openjdk
Copy link

openjdk bot commented Mar 20, 2025

@jatin-bhateja Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@jatin-bhateja
Copy link
Member Author

I looked at the changes in Generators.java, thanks for adding some code there 😊

Some comments on it:

  • You should add some Float16 tests to test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java.
  • I am missing the "mixed distribution" function float16s(). As a reference, take public Generator<Double> doubles(). The idea is that we have a set of distributions, and we pick a random distribution every time in the tests.
  • I'm also missing a "any bits" version, where you would take a random short value and reinterpret it as Float16. This ensures that we are getting all possible encodings, including multiple NaN encodings.
  • All of this is probably enough code to make a separate PR.

Hi @eme64
Your comments have been addressed.

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 8, 2025
@jatin-bhateja
Copy link
Member Author

Hi @eme64 , Please let us know if there are further comments.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 9, 2025
@eme64
Copy link
Contributor

eme64 commented Apr 9, 2025

@jatin-bhateja I think it looks good now, but let me run some more tests :)

@jatin-bhateja
Copy link
Member Author

@jatin-bhateja I think it looks good now, but let me run some more tests :)

Hi @eme64 , please let us know if this is good to land :-)

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No related test failures :) 🟢

Approved, thanks for the work @jatin-bhateja 😊

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 10, 2025
@jatin-bhateja
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Apr 10, 2025

Going to push as commit 9a3f999.
Since your change was applied there have been 130 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 10, 2025
@openjdk openjdk bot closed this Apr 10, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 10, 2025
@openjdk
Copy link

openjdk bot commented Apr 10, 2025

@jatin-bhateja Pushed as commit 9a3f999.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@jatin-bhateja
Copy link
Member Author

Thanks @sviswa7 and @eme64 for review and approvals :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler [email protected] integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants