Skip to content

llama.cpp always produces malformed output on Loongarch starting from b6353 #1

@darkgeek

Description

@darkgeek

Hi Loong64 community,

I've opened an issue in the llama.cpp repo reporting that the application produces garbage output regardless of the model used.

We've found that disabling Flash Attention (-fa off) resolves the issue. However, this problem appears specific to the Loong64 architecture now, as we haven't observed a similar problem on other architectures like ARM64.

Could anyone familiar with Loong64 and the llama.cpp build/implementation please help investigate this potential Flash Attention bug? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions