You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've opened an issue in the llama.cpp repo reporting that the application produces garbage output regardless of the model used.
We've found that disabling Flash Attention (-fa off) resolves the issue. However, this problem appears specific to the Loong64 architecture now, as we haven't observed a similar problem on other architectures like ARM64.
Could anyone familiar with Loong64 and the llama.cpp build/implementation please help investigate this potential Flash Attention bug? Thanks!