Commit f54ffa8
Allow all RDNA2 archs to use sdot4 intrinsic (ggml-org#8629)
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.1 parent 6676362 commit f54ffa8
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
459 | 459 | | |
460 | 460 | | |
461 | 461 | | |
462 | | - | |
| 462 | + | |
463 | 463 | | |
464 | 464 | | |
465 | 465 | | |
| |||
0 commit comments