I am running llama_cpp version 0.2.68 on Ubuntu 22.04LTS under conda environment. Attached are two Jupyter notebooks with ONLY one line changed (use CPU vs GPU). As you can see for exact same environmental conditions switching between CPU/GPU gives vastly different answers where the GPU is completely wrong. Some pointers on how to debug this I would appreciate it.
The only significant difference between the two files is this one liner
#n_gpu_layers=-1, # Uncomment to use GPU acceleration
The model used was openhermes-2.5-mistral-7b.Q5_K_M.gguf
mistral_llama_large-gpu.pdf
mistral_llama_large-cpu.pdf