Skip to content

Conversation

@jeroen-mostert
Copy link
Contributor

@jeroen-mostert jeroen-mostert commented Jul 22, 2024

The check gating the use of __builtin_amdgc_sdot4 specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using HSA_OVERRIDE_GFX_VERSION (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

With this change my custom ROCm build that includes gfx1036 support (and uses the gfx1030 kernels) performs identically with or without HSA_OVERRIDE_GFX_VERSION.

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
@github-actions github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Jul 22, 2024
@jeroen-mostert
Copy link
Contributor Author

Original discussion (including repro) is at lamikr/rocm_sdk_builder#114 (comment) . That incorrectly stated that there was no arch-specific logic since I missed it on first read.

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not particularly knowledgeable when it comes to ROCm but this looks correct to me.

@JohannesGaessler JohannesGaessler merged commit 46e4741 into ggml-org:master Jul 23, 2024
@jeroen-mostert jeroen-mostert deleted the patch-2 branch July 23, 2024 10:13
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants