prompt processing is extremely slow with a 70B partially offloaded.
llama-bench.exe -ngl 20 -m "D:\models\lzlv_70b_fp16_hf.Q4_K_M.gguf"
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device
| model |
size |
params |
backend |
ngl |
test |
t/s |
| llama 70B Q4_K - Medium |
38.58 GiB |
68.98 B |
SYCL |
20 |
pp 512 |
2.14 ± 0.28 |
| llama 70B Q4_K - Medium |
38.58 GiB |
68.98 B |
SYCL |
20 |
tg 128 |
1.03 ± 0.01 |
build: a28c5ef (2045)