GPU not being utilized on Windows

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior
GPU usage goes up with `-ngl` and decent inference performance. Expect to see around 170 ms/tok.

# Current Behavior
GPU memory usage goes up but activity stays at 0, only CPU usage increases. Getting around 2500 ms/tok.

# Environment and Context
Windows 11 - 3070 RTX

Attempting to run `codellama-13b-instruct.Q6_K.gguf`

I ran a git bisect which showed [017efe899d8](https://github.com/ggerganov/llama.cpp/commit/017efe899d8fa76118aef88e963210d48da01172) as the first bad commit. I see about a 10x drop in performance between ff5a3f0 and 017efe8, from 170ms/tok to 2500ms/tok.

# Steps to Reproduce

1. build with `cmake .. -DLLAMA_CUBLAS=ON`
1. run `.\bin\Release\server.exe -m ..\models\codellama-13b-instruct.Q6_K.gguf -c 4096 -ngl 24`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU not being utilized on Windows #3806

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU not being utilized on Windows #3806

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions