Skip to content

Conversation

@noemotiovon
Copy link
Collaborator

Optimize the MUL_MAT operator for the CANN backend. When the underlying aclnnWeightQuantBatchMatmulV2 operator is called and k <= QK8_0, use the per_channel algorithm instead of the per_group algorithm.

Test Cases

  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK

Signed-off-by: noemotiovon <[email protected]>
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 14, 2025
@hipudding hipudding self-requested a review March 14, 2025 08:04
@hipudding hipudding added the Ascend NPU issues specific to Ascend NPUs label Mar 14, 2025
Signed-off-by: noemotiovon <[email protected]>
@hipudding hipudding merged commit 92a3913 into ggml-org:master Mar 15, 2025
47 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants