Skip to content

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Sep 19, 2025

What does this PR do?

4m -> 6.69s

Mostly, from some test_eager_matches_sdpa_inference, we get 20s --> 1s

@ydshieh ydshieh requested a review from Cyrilvallez September 19, 2025 10:48
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks a lot! Do we absolutely need to use a single layer though? If it's not the real bottleneck, it would be best to keep 2!

num_attention_layers: int = 2,
num_attention_layers: int = 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to switch to only 1 layer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to 2 (14 seconds).

@ydshieh ydshieh enabled auto-merge (squash) September 19, 2025 12:42
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: efficientloftr

@ydshieh ydshieh merged commit 9d9c4d2 into main Sep 19, 2025
20 checks passed
@ydshieh ydshieh deleted the faster branch September 19, 2025 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants