-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
First became aware of issue when running the latest Koboldcpp release and previously workable configs started failing, discussed at LostRuins#1805
====
Running llama-b6567-bin-win-vulkan-x64 llama-bench with an RX 480 8GB with a --ubatch-size of 512 and an --n-prompt value of first 14592, then 14593, the Vulkan0 compute buffer size is 1022.50 MiB, 1024.69 MiB respectively.
Repeating the same process with llama-b6568-bin-win-vulkan-x64, again with the same --n-prompt values of 14592, then 14593, the compute buffer size is 1022.50 MiB, 1910.13 MiB respectively. A sizeable discrepancy.
Below n-prompt 14592, both versions behave identically all the way down. Above 14593, b6568 continues accumulating as "normal" until 15040, clocking in at 1963.75 MiB, then at 15041 jumps down to 1611.48 MiB. For comparison, b6567 at 15041 allocates 1055.31 MiB to the buffer, and grows as normal every step of the way.
Finally, the allocation discrepancy between versions continues to diminish as the context increases, and can be assumed to disappear entirely at some point -- or perhaps even result in memory savings? Nah, that'd be too good I'm sure. I've only tested it up to 20k, where it's still some 450 mb above the b6567 "standard."
Tested ClBlast and Cuda via Zluda emulation to ensure that it's a vulkan-specific issue, and it is. Don't as of yet know if it's repeatable on other hardware and would love it if someone could test this out. What changed between b6567 and b6568 that could result in such behavior, and is it expected? And, I should obviously mention that, same behavior persists with b6833.