Remove unnecessary slicing in sdpa_attention_forward #41900

justinchuby · 2025-10-27T18:21:19Z

The slicing in sdpa_attention_forward was there only because some masks were not constructed correctly (I was told). When the key size is dynamic, the slice op also prevents torch.export from correctly reasoning about its size.

cc @vasqu

The slicing in sdpa_attention_forward was there only because some masks were not constructed correctly (I was told). When the dimension is dynamic, the slice op also prevents torch.export from correctly reasoning about its size. Signed-off-by: Justin Chu <[email protected]>

justinchuby · 2025-10-27T21:10:34Z

@Cyrilvallez Looks like this change passes the CI.

justinchuby · 2025-10-29T15:48:48Z

@vasqu @Cyrilvallez any thoughts? Thanks. This is an important fix we hope to include in the 5.0 release

vasqu · 2025-10-30T10:11:51Z

Responded in #41559 (comment)

But I'm pro this, we might wanna check some important models with slow run. Let's wait for Cyril for a final decision

Cyrilvallez · 2025-11-07T17:56:08Z

Sorry for the delay, I was off as @vasqu mentioned! Still very relevant, would be very happy to finally remove this (and in other attn functions as well, such as the eager ones but I can take care of it myself later no worries)

cc @ydshieh, could you run a more extensive CI run on this PR and tell us whether you see any new failures, especially on older models? I don't have much time to do it manually myself as I need to catch up on all reviews 🤓 Just a bit scared that the fast tests may not be enough on this one!

ydshieh · 2025-11-12T09:15:53Z

For sure, thank you for the ping. I will report back today or tomorrow.

ydshieh · 2025-11-12T16:53:08Z

run-slow: bert, gpt2, t5, modernbert, vit, clip, detr, table_transformer, got_ocr2, whisper, wav2vec2, qwen2_audio, speech_t5, csm, llama, gemma3, qwen2, mistral3, qwen2_5_vl, llava, smolvlm, internvl, gemma3n, gpt_oss, qwen2_5_omni

github-actions · 2025-11-12T16:54:21Z

This comment contains run-slow, running the specified jobs:

models: ["models/bert", "models/clip", "models/csm", "models/detr", "models/gemma3", "models/gemma3n", "models/got_ocr2", "models/gpt2", "models/gpt_oss", "models/internvl", "models/llama", "models/llava", "models/mistral3", "models/modernbert", "models/qwen2", "models/qwen2_5_omni", "models/qwen2_5_vl", "models/qwen2_audio", "models/smolvlm", "models/t5", "models/table_transformer", "models/vit", "models/wav2vec2", "models/whisper"]
quantizations: []

github-actions · 2025-11-12T19:55:14Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

ydshieh · 2025-11-12T20:13:57Z

[Update] Looks good!

~~The report is good, but let me take a look deeper inside the workflow run to make sure.~~

Cyrilvallez · 2025-11-13T09:28:56Z

Alright, amazing @ydshieh, thanks! Merging then! Thanks again @justinchuby for pushing on something we wanted since a long time!

justinchuby mentioned this pull request Oct 27, 2025

Fix executorch export with dynamic shapes #41559

Draft

5 tasks

justinchuby changed the title ~~Remove redundant slicing in sdpa_attention_forward~~ Remove unnecessary slicing in sdpa_attention_forward Oct 27, 2025

Cyrilvallez approved these changes Nov 13, 2025

View reviewed changes

Cyrilvallez merged commit f40ef03 into huggingface:main Nov 13, 2025
22 checks passed

Cyrilvallez mentioned this pull request Nov 13, 2025

Remove mask slicing in all eager attentions #42186

Open

justinchuby mentioned this pull request Nov 18, 2025

PyTorch Dynamo ONNX Export: Cosmos Video Transformer Model pytorch/pytorch#168048

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove unnecessary slicing in sdpa_attention_forward #41900

Remove unnecessary slicing in sdpa_attention_forward #41900

justinchuby commented Oct 27, 2025 •

edited

Loading

Uh oh!

justinchuby commented Oct 27, 2025

Uh oh!

justinchuby commented Oct 29, 2025 •

edited

Loading

Uh oh!

vasqu commented Oct 30, 2025

Uh oh!

Cyrilvallez commented Nov 7, 2025 •

edited

Loading

Uh oh!

ydshieh commented Nov 12, 2025 •

edited

Loading

Uh oh!

ydshieh commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

ydshieh commented Nov 12, 2025 •

edited

Loading

Uh oh!

Cyrilvallez commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Remove unnecessary slicing in sdpa_attention_forward #41900

Remove unnecessary slicing in sdpa_attention_forward #41900

Conversation

justinchuby commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinchuby commented Oct 27, 2025

Uh oh!

justinchuby commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented Oct 30, 2025

Uh oh!

Cyrilvallez commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ydshieh commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

CI Results

Uh oh!

ydshieh commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

justinchuby commented Oct 27, 2025 •

edited

Loading

justinchuby commented Oct 29, 2025 •

edited

Loading

Cyrilvallez commented Nov 7, 2025 •

edited

Loading

ydshieh commented Nov 12, 2025 •

edited

Loading

ydshieh commented Nov 12, 2025 •

edited

Loading