Significantly different results (and WRONG) inference when GPU is enabled.

I am running llama_cpp version 0.2.68 on Ubuntu 22.04LTS under conda environment. Attached are two Jupyter notebooks with ONLY one line changed (use CPU vs GPU).  As you can see for exact same environmental conditions switching between CPU/GPU gives vastly different answers where the GPU is completely wrong.  Some pointers on how to debug this I would appreciate it.

The only significant difference between the two files is this one liner
      `#n_gpu_layers=-1, # Uncomment to use GPU acceleration`

The model used was **openhermes-2.5-mistral-7b.Q5_K_M.gguf**

[mistral_llama_large-gpu.pdf](https://github.com/ggerganov/llama.cpp/files/15192723/mistral_llama_large-gpu.pdf)
[mistral_llama_large-cpu.pdf](https://github.com/ggerganov/llama.cpp/files/15192725/mistral_llama_large-cpu.pdf)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significantly different results (and WRONG) inference when GPU is enabled. #7048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significantly different results (and WRONG) inference when GPU is enabled. #7048

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions