Skip to content

Conversation

@jcaip
Copy link
Contributor

@jcaip jcaip commented Mar 4, 2025

Built on top pf #1201. This pull request introduces support for ROCm (Radeon Open Compute) for sparse marling kernel in addition to CUDA, enabling the code to run on AMD GPUs.

The main changes involve conditional compilation to handle differences between CUDA and ROCm, as well as adding ROCm-specific intrinsics for MI300x.

co-author : @lcskrishna


Key changes include:

ROCm Support in setup.py:

  • hip kernels generation

Conditional Compilation in CUDA Source Files:

  • Added conditional compilation directives to exclude certain code for ROCm and include ROCm-specific implementations.

ROCm-specific Implementations:

  • Implemented ROCm-specific versions of functions and macros that are different from their CUDA counterparts, ensuring compatibility and performance on AMD GPUs.

Next:

  • validation and benchmark across workloads on MIxxx GPUs

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1834

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending, 1 Unrelated Failure

As of commit 15e29f1 with merge base 883dc65 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2025
@jcaip jcaip changed the title Jcaip/update ROCm Sparse Marlin Kernels #1206 Mar 4, 2025
@jcaip
Copy link
Contributor Author

jcaip commented Mar 4, 2025

This is a dupe of #1206, just needed to fix some setup.py stuff and didn't have write access to the fork

@jcaip jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Mar 4, 2025
@jcaip jcaip merged commit 9b18955 into main Mar 5, 2025
50 of 56 checks passed
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
* enable build for rocm for fp6_llm

* enable tiled layout extension

* fix build error related to option

* require rocm 6.2

* enable tensor tiled layout extension with successful compilation

* clean-up

* fix potential memory access issue

* fix __nv_bfloat162 init

* add comment for MI300x isa

* fix build for non-rocm

* add sparse_marlin kernel to the build

* drop .h from conversion

* cp_asyc4_pred_zfill() AMD implementation

* implement matching mem utility with amd GCN isa

* implement mma util with amd gcn isa

* enable rocm path

* update copy from global to lds

* implement cvta_to_shared()

* consolidate code with cvta_to_shared()

* lint

* add GPU arch check for MI300x

* revert change in tensor_core_tile_layout.cu

* lint

refactor for better readibility

* fix setup

---------

Co-authored-by: lcskrishna <[email protected]>
Co-authored-by: Peter Yeh <[email protected]>
Co-authored-by: Peter Y. Yeh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm sparsity topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants