Releases: AmpereComputingAI/llama.cpp
Releases · AmpereComputingAI/llama.cpp
v3.2.1
v3.2.0
v3.1.2
v3.1.0
v2.2.1
Update benchmark.py
v2.0.0
- Upgraded upstream tag enables Llama 3.1 in ollama
- Support for AmpereOne platform
- Breaking change: due to changed weight type IDs it is now required to re-quantize models to Q8R16 and Q4_K_4 formats with current llama-quantize tool.
v1.2.6
Create README.md
v1.2.3
- The rebase is to allow llama-cpp-python to pick up upstream CVE fix (GHSA-56xg-wfcc-g829)
- Experimental support for Q8R16 quantized format with optimized matrix multiplication kernels
- CMake files updated to build llama.aio on AmpereOne
v1.2.2
Release notes:
- Fix llama-3 end of token issue
- Update server to support ollama (v0.1.33)
- llama.aio docker support server mode by default
SHA-256 hashes:
- 6c580006a8faf7b73a424b0020f1bda2684aa7e1796182f68bfa8b7fee08d991 llama_cpp_python-0.2.63-cp311-cp311-linux_aarch64.whl
- 1ffde8093abe18f638fb89273dd56664dd7ff6b8c82383099ea620d18ab562a7 llama_aio_v1.2.2_b769bc1.tar.gz