llama.cpp always produces malformed output on Loongarch starting from b6353

Hi Loong64 community,

I've opened an [issue](https://github.com/ggml-org/llama.cpp/issues/15854) in the llama.cpp repo reporting that the application produces garbage output regardless of the model used.

We've found that disabling Flash Attention (`-fa off`) resolves the issue. However, this problem appears specific to the Loong64 architecture now, as we haven't observed a similar problem on other architectures like ARM64.

Could anyone familiar with Loong64 and the llama.cpp build/implementation please help investigate this potential Flash Attention bug? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama.cpp always produces malformed output on Loongarch starting from b6353 #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llama.cpp always produces malformed output on Loongarch starting from b6353 #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions