This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Description
🐛 Describe the bug
Hi maintainers,
I find that, Torchchat uses MATH as SDPA backend in https://github.com/pytorch/torchchat/blob/main/torchchat/generate.py#L542. However, for other libs like vllm, they all accept flash attention as default backend.
So why Torchchat uses MATH as a default backend? Is this required for accuracy? If not, I can help to add an argument to let user set the backend. Thanks!
Versions