Add SDPA and FlashAttention-2 support to LayoutLMv3 #42225

jackiehimel · 2025-11-16T05:50:57Z

What does this PR do?

Adds SDPA and FlashAttention-2 support to LayoutLMv3 using the unified attention interface pattern, following the same architecture used in BERT and other recent model implementations.

Fixes #35467

Implementation

Refactored LayoutLMv3SelfAttention to use ALL_ATTENTION_FUNCTIONS interface instead of separate attention classes
Created layoutlmv3_eager_attention_forward function that implements the CogView attention mechanism (alpha=32 scaling) with support for LayoutLMv3's relative position bias and spatial attention bias
Added _supports_sdpa = True and _supports_flash_attn = True flags to LayoutLMv3PreTrainedModel
Updated mask creation to use create_bidirectional_mask (replacing get_extended_attention_mask)
Threaded layer_idx parameter through attention classes for consistency
Added automatic enforcement in LayoutLMv3Config to set attn_implementation="eager" when relative or spatial attention biases are enabled (default behavior)

Note: SDPA and FlashAttention-2 are incompatible with LayoutLMv3's relative position bias and spatial attention bias. The config automatically enforces eager attention when these biases are enabled (the default). To use SDPA/FlashAttention-2, users must disable both biases (has_relative_attention_bias=False and has_spatial_attention_bias=False).

Type of change

New feature (non-breaking change which adds functionality)

How has this change been tested?

CI tests passing, linting and formatting locally validated.
Added test skips for SDPA/Flash comparison tests (LayoutLMv3 defaults to eager when biases are enabled) and overrode test_batching_equivalence to ensure eager attention is used.
Implementation follows the unified attention pattern from BERT.

Before submitting

Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@vasqu - Thanks for the feedback on #41801! I've refactored this to use the unified attention interface pattern as you suggested. Would appreciate another look when you have time.

@ArthurZucker @Cyrilvallez - attention implementation reviewers

- Implement unified attention interface following BERT pattern - Add layoutlmv3_eager_attention_forward with support for relative position bias and spatial attention bias - Add support flags _supports_flash_attn and _supports_sdpa - Update attention classes to use unified interface - Automatically set _attn_implementation='eager' when relative/spatial biases are enabled in config - Fix test configurations to use eager attention by default - Override incompatible SDPA/FlashAttention tests with skipTest - Fix missing case for spatial-only attention bias handling - Fix position_ids expansion to support inputs_embeds - Replace get_extended_attention_mask with create_bidirectional_mask Fixes huggingface#35467

github-actions · 2025-11-16T22:15:19Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: layoutlmv3

vasqu

Sorry, I did go through the code and gave some comments but I'm noticing that layout lm only uses the relative bias. If that's the case, then the usage of other attention flavors is questionable as they won't be used either way.

Imo, it would make more sense to go for other models that are suitable. You already did a overall pretty good job over here.

vasqu · 2025-11-17T13:21:36Z