Skip to content

Misc. bug: Performance regression on aarch64 q4_0 #14134

@njsyw1997

Description

@njsyw1997

Name and Version

llama-cli --version
version: 5615 (f470bc3)
built with Android (13324770, +pgo, +bolt, +lto, +mlgo, based on r530567d) clang version 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) for x86_64-unknown-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-bench

Command line

Problem description & steps to reproduce

Q4_0 performance significantly dropped after this commit
/build-android-f470bc36/llama-bench -m ../gemma-2-2b-q4_0.gguf -p 512 -n 0

model size params backend threads test t/s
gemma2 2B Q4_0 1.51 GiB 2.61 B CPU 8 pp512 15.84 ± 0.01

build: f470bc3 (5615)
/build-android-8f47e25f/llama-bench -m ../gemma-2-2b-q4_0.gguf -p 512 -n 0

model size params backend threads test t/s
gemma2 2B Q4_0 1.51 GiB 2.61 B CPU 8 pp512 138.02 ± 8.88

build: 8f47e25 (5614)

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions