Skip to content

Significantly different results (and WRONG) inference when GPU is enabled. #7048

@phishmaster

Description

@phishmaster

I am running llama_cpp version 0.2.68 on Ubuntu 22.04LTS under conda environment. Attached are two Jupyter notebooks with ONLY one line changed (use CPU vs GPU). As you can see for exact same environmental conditions switching between CPU/GPU gives vastly different answers where the GPU is completely wrong. Some pointers on how to debug this I would appreciate it.

The only significant difference between the two files is this one liner
#n_gpu_layers=-1, # Uncomment to use GPU acceleration

The model used was openhermes-2.5-mistral-7b.Q5_K_M.gguf

mistral_llama_large-gpu.pdf
mistral_llama_large-cpu.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    Nvidia GPUIssues specific to Nvidia GPUsbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions