-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
GPU usage goes up with -ngl
and decent inference performance. Expect to see around 170 ms/tok.
Current Behavior
GPU memory usage goes up but activity stays at 0, only CPU usage increases. Getting around 2500 ms/tok.
Environment and Context
Windows 11 - 3070 RTX
Attempting to run codellama-13b-instruct.Q6_K.gguf
I ran a git bisect which showed 017efe899d8 as the first bad commit. I see about a 10x drop in performance between ff5a3f0 and 017efe8, from 170ms/tok to 2500ms/tok.
Steps to Reproduce
- build with
cmake .. -DLLAMA_CUBLAS=ON
- run
.\bin\Release\server.exe -m ..\models\codellama-13b-instruct.Q6_K.gguf -c 4096 -ngl 24
atonalfreerider and arbitropy
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working