-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed
Labels
Description
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
System: AWS Graviton3, c7g.16xl instance with Ubuntu 22.04
llama.cpp version: latest, commit: 6f9939d
the following commit is capping the num of tasks to 4. I would like to understand why 4?
commit adf3de4f69ff7e44131222f05f9c7447ac0be3cb (HEAD, tag: b1605)
Author: Georgi Gerganov <[email protected]>
Date: Sun Dec 3 15:56:22 2023 +0200
ggml : fix soft max out-of-bounds access (#4307)
ggml-ci
Without 4 and just using the src num rows or n_threads for n_tasks, the prompt eval performance is improved by 4% for DOT kernels and 9% for MMLA kernels (PR)
n_tasks = MIN(n_threads, ggml_nrows(node->src[0]));
Reproducer:
./main -m /llama.cpp/models/open_llama_13b/ggml-model-q8_0.gguf -c 1015 -n 256 -t 64 --file <input_file.txt>