Remove unnecessary slicing in sdpa_attention_forward (#41900)

justinchuby · web-flow · commit f40ef032145c · 2025-11-13T10:29:38.000+01:00
Remove redundant slicing in sdpa_attention_forward

The slicing in sdpa_attention_forward was there only because some masks were not constructed correctly (I was told). When the dimension is dynamic, the slice op also prevents torch.export from correctly reasoning about its size.

Signed-off-by: Justin Chu &lt;justinchuby@users.noreply.github.com&gt;
diff --git a/src/transformers/integrations/sdpa_attention.py b/src/transformers/integrations/sdpa_attention.py
@@ -63,9 +63,6 @@ def sdpa_attention_forward(
         else:
             sdpa_kwargs = {"enable_gqa": True}
 
-    if attention_mask is not None and attention_mask.ndim == 4:
-        attention_mask = attention_mask[:, :, :, : key.shape[-2]]
-
     # Instead of relying on the value set in the module directly, we use the is_causal passed in kwargs if it is presented
     is_causal = is_causal if is_causal is not None else getattr(module, "is_causal", True)