ROCm Sparse Marlin Kernels #1206 #1834

jcaip · 2025-03-04T21:35:25Z

Built on top pf #1201. This pull request introduces support for ROCm (Radeon Open Compute) for sparse marling kernel in addition to CUDA, enabling the code to run on AMD GPUs.

The main changes involve conditional compilation to handle differences between CUDA and ROCm, as well as adding ROCm-specific intrinsics for MI300x.

co-author : @lcskrishna

Key changes include:

ROCm Support in `setup.py`:

hip kernels generation

Conditional Compilation in CUDA Source Files:

Added conditional compilation directives to exclude certain code for ROCm and include ROCm-specific implementations.

ROCm-specific Implementations:

Implemented ROCm-specific versions of functions and macros that are different from their CUDA counterparts, ensuring compatibility and performance on AMD GPUs.

validation and benchmark across workloads on MIxxx GPUs

ROCm build infrastructure

[ROCm] Enable Tiled layout extension and minor changes to setup

Fixes builds for non-rocm.

refactor for better readibility

pytorch-bot · 2025-03-04T21:35:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1834

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending, 1 Unrelated Failure

As of commit 15e29f1 with merge base 883dc65 ():

NEW FAILURE - The following job has failed:

Run TorchAO Experimental Tests / test-cpu-ops (macos-14) (gh)
torchao/experimental/tests/test_int8_dynamic_activation_intx_weight.py::TestInt8DynamicActivationIntxWeight::test_export_compile_aoti_PackedLinearInt8DynamicActivationIntxWeightLayout

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run TorchAO Experimental Tests / test-mps-ops (macos-m1-stable) (gh) (trunk failure)
AttributeError: '_OpNamespace' 'torchao' object has no attribute '_linear_fp_act_1bit_weight'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jcaip · 2025-03-04T22:17:31Z

This is a dupe of #1206, just needed to fix some setup.py stuff and didn't have write access to the fork

* enable build for rocm for fp6_llm * enable tiled layout extension * fix build error related to option * require rocm 6.2 * enable tensor tiled layout extension with successful compilation * clean-up * fix potential memory access issue * fix __nv_bfloat162 init * add comment for MI300x isa * fix build for non-rocm * add sparse_marlin kernel to the build * drop .h from conversion * cp_asyc4_pred_zfill() AMD implementation * implement matching mem utility with amd GCN isa * implement mma util with amd gcn isa * enable rocm path * update copy from global to lds * implement cvta_to_shared() * consolidate code with cvta_to_shared() * lint * add GPU arch check for MI300x * revert change in tensor_core_tile_layout.cu * lint refactor for better readibility * fix setup --------- Co-authored-by: lcskrishna <[email protected]> Co-authored-by: Peter Yeh <[email protected]> Co-authored-by: Peter Y. Yeh <[email protected]>

lcskrishna and others added 30 commits October 16, 2024 05:19

enable build for rocm for fp6_llm

6d92e40

Merge pull request #1 from lcskrishna/cl/rocm-enablement

14b3fce

ROCm build infrastructure

enable tiled layout extension

f1a22cf

fix build error related to option

0bef6ca

require rocm 6.2

893ae03

enable tensor tiled layout extension with successful compilation

a0d3788

enable successful build

e4e654d

clean-up

3e2c6a1

Merge pull request #3 from lcskrishna/csrikris_enable_tensor_tile

c86880e

[ROCm] Enable Tiled layout extension and minor changes to setup

fix potential memory access issue

91d3c75

fix __nv_bfloat162 init

38b7d1c

add comment for MI300x isa

279f4b3

Merge branch 'main' into rocm_enablement_staging

612ad14

fix build for non-rocm

bbf5a72

Merge pull request #4 from lcskrishna/rocm_enablement

735570e

Fixes builds for non-rocm.

Merge branch 'main' into rocm_enablement_staging

253c188

add sparse_marlin kernel to the build

a2f1736

drop .h from conversion

f817edf

cp_asyc4_pred_zfill() AMD implementation

c9bc1bc

implement matching mem utility with amd GCN isa

16feff4

implement mma util with amd gcn isa

0b21555

enable rocm path

f23b194

update copy from global to lds

ecc3927

implement cvta_to_shared()

a80730b

consolidate code with cvta_to_shared()

d2c7ce4

Merge branch 'main' into rocm_sparse_marlin

15974c7

lint

a4e8c30

add GPU arch check for MI300x

c678cb0

revert change in tensor_core_tile_layout.cu

08d1cfb

Merge branch 'main' into rocm_sparse_marlin

b96196b

Peter Y. Yeh and others added 4 commits January 15, 2025 15:51

lint

aea9d81

refactor for better readibility

Merge branch 'main' into rocm_sparse_marlin

f18043d

Merge branch 'main' into rocm_sparse_marlin

8b34390

fix setup.py conflict

af7027d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2025

jcaip added the ciflow/rocm label Mar 4, 2025

fix setup

15e29f1

jcaip added sparsity module: rocm labels Mar 4, 2025

jcaip changed the title ~~Jcaip/update~~ ROCm Sparse Marlin Kernels #1206 Mar 4, 2025

jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Mar 4, 2025

drisspg approved these changes Mar 4, 2025

View reviewed changes

jcaip merged commit 9b18955 into main Mar 5, 2025
50 of 56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ROCm Sparse Marlin Kernels #1206 #1834

ROCm Sparse Marlin Kernels #1206 #1834

Uh oh!

jcaip commented Mar 4, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 4, 2025 •

edited

Loading

Uh oh!

jcaip commented Mar 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ROCm Sparse Marlin Kernels #1206 #1834

ROCm Sparse Marlin Kernels #1206 #1834

Uh oh!

Conversation

jcaip commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ROCm Support in setup.py:

Conditional Compilation in CUDA Source Files:

ROCm-specific Implementations:

Uh oh!

pytorch-bot bot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1834

❌ 1 New Failure, 1 Pending, 1 Unrelated Failure

Uh oh!

jcaip commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jcaip commented Mar 4, 2025 •

edited

Loading

ROCm Support in `setup.py`:

pytorch-bot bot commented Mar 4, 2025 •

edited

Loading

jcaip commented Mar 4, 2025 •

edited

Loading