Bump transformers, improve SDPA, reduce Optimum reliance #188

jackzhxng · 2025-11-18T22:32:34Z

Bump transformers to tip of main huggingface/transformers#42260 in preparation for adding new Granite 4 Micro model.

To make this bump, we need:

Bump Optimum to 2.0.0
Reduce reliance on Optimum utility functions 1. in preparation for migration to ExecuTorch main repo and 2. since a fix for a blocking issue is not yet included in a patch release
Remove usage of no longer existing sdpa_without_vmap for attention and attention mask
- Previously an attention_mask was always getting constructed regardless of whether the attention was causal just to find start_pos, which was wasteful because the custom SDPA op builds the attention mask itself for causal. Now we are able to skip manual mask creation for causal attention. This should improve prefill speed slightly.

Bump transformers

abae53e

jackzhxng requested review from kimishpatel and mergennachin November 18, 2025 22:32

jackzhxng marked this pull request as draft November 18, 2025 22:59

Provide feedback