Releases · AmpereComputingAI/llama.cpp

Upgraded upstream tag enables Llama 3.1 in ollama
Support for AmpereOne platform
Breaking change: due to changed weight type IDs it is now required to re-quantize models to Q8R16 and Q4_K_4 formats with current llama-quantize tool.

Assets 4

16 Jul 23:03

jan-grzybek-ampere

v1.2.6

06e1efb

v1.2.6

Create README.md

Assets 4

02 Jul 22:47

jan-grzybek-ampere

v1.2.3

855aa8d

v1.2.3

The rebase is to allow llama-cpp-python to pick up upstream CVE fix (GHSA-56xg-wfcc-g829)
Experimental support for Q8R16 quantized format with optimized matrix multiplication kernels
CMake files updated to build llama.aio on AmpereOne

Assets 7

21 May 19:07

jan-grzybek-ampere

v1.2.2

b3c7a0a

v1.2.2

Release notes:

Fix llama-3 end of token issue
Update server to support ollama (v0.1.33)
llama.aio docker support server mode by default

SHA-256 hashes:

6c580006a8faf7b73a424b0020f1bda2684aa7e1796182f68bfa8b7fee08d991 llama_cpp_python-0.2.63-cp311-cp311-linux_aarch64.whl
1ffde8093abe18f638fb89273dd56664dd7ff6b8c82383099ea620d18ab562a7 llama_aio_v1.2.2_b769bc1.tar.gz

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: AmpereComputingAI/llama.cpp

v3.2.1

Uh oh!

v3.2.0

Uh oh!

v3.1.2

Uh oh!

v3.1.0

Uh oh!

v2.2.1

Uh oh!

v2.0.0

Uh oh!

v1.2.6

Uh oh!

v1.2.3

Uh oh!

v1.2.2

Uh oh!