Skip to content

Conversation

@iSma
Copy link
Contributor

@iSma iSma commented Jan 21, 2024

I've noticed that since PR #4605, performance (CPU-only) took a massive dive when using the Nix flake (I went from ~4 tokens/s to <0.5). It seems that the slowdown is caused by LLAMA_NATIVE=ON. Reverting to OFF (as it was before the PR) restores the expected performance.

This regression was observed on both an i7-1165G7 and a Ryzen 3800X running NixOS.

FWIW, the llama-cpp package in nixpkgs has LLAMA_NATIVE=OFF.

I'm not sure what the implications of turning off LLAMA_NATIVE are, maybe @philiptaron and @SomeoneSerge want to chime in.

@SomeoneSerge
Copy link
Collaborator

SomeoneSerge commented Jan 21, 2024

option(LLAMA_NATIVE "llama: enable -march=native flag" ON)

Oh yes, we surely would prefer that OFF. Ideally, we never resort to -march=native (which generates random outputs depending on the builder's scheduler, load, and hardware), but instead model concrete targets or concrete architecture levels as part of the derivation

@SomeoneSerge SomeoneSerge added the nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment label Jan 21, 2024
@SomeoneSerge SomeoneSerge merged commit 504dc37 into ggml-org:master Jan 21, 2024
Copy link
Collaborator

@philiptaron philiptaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang, that's my fault in doing the transcription. Call it an "off by on" error 😅 . Thanks for the PR; LGTM.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants