Skip to content

Conversation

@justinchuby
Copy link
Contributor

@justinchuby justinchuby commented Oct 27, 2025

The slicing in sdpa_attention_forward was there only because some masks were not constructed correctly (I was told). When the key size is dynamic, the slice op also prevents torch.export from correctly reasoning about its size.

cc @vasqu

The slicing in sdpa_attention_forward was there only because some masks were not constructed correctly (I was told). When the dimension is dynamic, the slice op also prevents torch.export from correctly reasoning about its size.

Signed-off-by: Justin Chu <[email protected]>
@justinchuby
Copy link
Contributor Author

@Cyrilvallez Looks like this change passes the CI.

@justinchuby justinchuby changed the title Remove redundant slicing in sdpa_attention_forward Remove unnecessary slicing in sdpa_attention_forward Oct 27, 2025
@justinchuby
Copy link
Contributor Author

justinchuby commented Oct 29, 2025

@vasqu @Cyrilvallez any thoughts? Thanks. This is an important fix we hope to include in the 5.0 release

@vasqu
Copy link
Contributor

vasqu commented Oct 30, 2025

Responded in #41559 (comment)

But I'm pro this, we might wanna check some important models with slow run. Let's wait for Cyril for a final decision

@Cyrilvallez
Copy link
Member

Cyrilvallez commented Nov 7, 2025

Sorry for the delay, I was off as @vasqu mentioned! Still very relevant, would be very happy to finally remove this (and in other attn functions as well, such as the eager ones but I can take care of it myself later no worries)

cc @ydshieh, could you run a more extensive CI run on this PR and tell us whether you see any new failures, especially on older models? I don't have much time to do it manually myself as I need to catch up on all reviews 🤓 Just a bit scared that the fast tests may not be enough on this one!

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 12, 2025

For sure, thank you for the ping. I will report back today or tomorrow.

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 12, 2025

run-slow: bert, gpt2, t5, modernbert, vit, clip, detr, table_transformer, got_ocr2, whisper, wav2vec2, qwen2_audio, speech_t5, csm, llama, gemma3, qwen2, mistral3, qwen2_5_vl, llava, smolvlm, internvl, gemma3n, gpt_oss, qwen2_5_omni

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/bert", "models/clip", "models/csm", "models/detr", "models/gemma3", "models/gemma3n", "models/got_ocr2", "models/gpt2", "models/gpt_oss", "models/internvl", "models/llama", "models/llava", "models/mistral3", "models/modernbert", "models/qwen2", "models/qwen2_5_omni", "models/qwen2_5_vl", "models/qwen2_audio", "models/smolvlm", "models/t5", "models/table_transformer", "models/vit", "models/wav2vec2", "models/whisper"]
quantizations: []

@github-actions
Copy link
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 12, 2025

[Update] Looks good!

The report is good, but let me take a look deeper inside the workflow run to make sure.

@Cyrilvallez
Copy link
Member

Alright, amazing @ydshieh, thanks! Merging then! Thanks again @justinchuby for pushing on something we wanted since a long time!

@Cyrilvallez Cyrilvallez merged commit f40ef03 into huggingface:main Nov 13, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants