Commit 0f1a39f
authored
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)
* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files
* Arm AArch64: minor code refactoring for rebase
* Arm AArch64: minor code refactoring for resolving a build issue with cmake
* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: minor code change for resolving a build issue with server-windows
* retrigger checks
* Arm AArch64: minor code changes for rebase
* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits
* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig
* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: minor code refactoring
* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat
* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat
* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types
* Arm AArch64: minor code refactoring
* Arm AArch64: minor code refactoring
* Arm AArch64: minor code refactoring
* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure
* Arm AArch64: remove a redundant comment
* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off
* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels
* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type1 parent 83321c6 commit 0f1a39f
File tree
14 files changed
+2534
-53
lines changed- docs
- examples/quantize
- ggml
- include
- src
- include
- src
14 files changed
+2534
-53
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
835 | 835 | | |
836 | 836 | | |
837 | 837 | | |
838 | | - | |
| 838 | + | |
| 839 | + | |
839 | 840 | | |
840 | 841 | | |
841 | 842 | | |
| |||
969 | 970 | | |
970 | 971 | | |
971 | 972 | | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
972 | 980 | | |
973 | 981 | | |
974 | 982 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
| |||
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
| 45 | + | |
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
49 | 52 | | |
50 | 53 | | |
51 | 54 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
383 | 383 | | |
384 | 384 | | |
385 | 385 | | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
386 | 389 | | |
387 | 390 | | |
388 | 391 | | |
| |||
424 | 427 | | |
425 | 428 | | |
426 | 429 | | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
427 | 433 | | |
428 | 434 | | |
429 | 435 | | |
| |||
2406 | 2412 | | |
2407 | 2413 | | |
2408 | 2414 | | |
| 2415 | + | |
| 2416 | + | |
| 2417 | + | |
| 2418 | + | |
| 2419 | + | |
| 2420 | + | |
2409 | 2421 | | |
2410 | 2422 | | |
2411 | 2423 | | |
| |||
2418 | 2430 | | |
2419 | 2431 | | |
2420 | 2432 | | |
| 2433 | + | |
| 2434 | + | |
| 2435 | + | |
| 2436 | + | |
| 2437 | + | |
2421 | 2438 | | |
2422 | 2439 | | |
2423 | 2440 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1153 | 1153 | | |
1154 | 1154 | | |
1155 | 1155 | | |
| 1156 | + | |
1156 | 1157 | | |
1157 | 1158 | | |
1158 | 1159 | | |
| |||
0 commit comments