[soft max] capping the num tasks to 4 is limiting the prompt eval perf

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

System: AWS Graviton3, c7g.16xl instance with Ubuntu 22.04
llama.cpp version: latest, commit: 6f9939d119b2d004c264952eb510bd106455531e

the following commit is capping the num of tasks to 4. I would like to understand why 4?
~~~
commit adf3de4f69ff7e44131222f05f9c7447ac0be3cb (HEAD, tag: b1605)
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Dec 3 15:56:22 2023 +0200

    ggml : fix soft max out-of-bounds access (#4307)

    ggml-ci
~~~
Without 4 and just using the src num rows or n_threads for n_tasks, the prompt eval performance is improved by 4% for DOT kernels and 9% for MMLA kernels ([PR](https://github.com/ggerganov/llama.cpp/pull/4966))
`n_tasks = MIN(n_threads, ggml_nrows(node->src[0]));`


Reproducer: 
`./main -m /llama.cpp/models/open_llama_13b/ggml-model-q8_0.gguf -c 1015 -n 256 -t 64 --file <input_file.txt>`




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[soft max] capping the num tasks to 4 is limiting the prompt eval perf #5103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[soft max] capping the num tasks to 4 is limiting the prompt eval perf #5103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions