Skip to content

Conversation

@jackzhxng
Copy link
Collaborator

Bump transformers to tip of main huggingface/transformers#42260 in preparation for adding new Granite 4 Micro model.

To make this bump, we need:

  • Bump Optimum to 2.0.0
  • Reduce reliance on Optimum utility functions 1. in preparation for migration to ExecuTorch main repo and 2. since a fix for a blocking issue is not yet included in a patch release
  • Remove usage of no longer existing sdpa_without_vmap for attention and attention mask
    • Previously an attention_mask was always getting constructed regardless of whether the attention was causal just to find start_pos, which was wasteful because the custom SDPA op builds the attention mask itself for causal. Now we are able to skip manual mask creation for causal attention. This should improve prefill speed slightly.

@jackzhxng jackzhxng marked this pull request as draft November 18, 2025 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants