Does attention masking actually work?

I tried passing in an `attention_mask`, for use in a stable-diffusion Unet but it doesn't actually get passed down as deep as `CrossAttention#forward`.

I tried fixing it to pass the param down, but it blows up on tensor size mismatch, because self-attention and cross-attention have different masking requirements.

I made my own implementation of cross-attention masking a few weeks ago (before the refactor) but never upstreamed it. mainly because I didn't understand whether I'd done it correctly (I re-used the [lucidrains implementation](https://github.com/lucidrains/perceiver-pytorch/blob/c8c5f5721520460369a66b8a0e9c5147df4a883e/perceiver_pytorch/perceiver_pytorch.py#L108-L112) that CompVis used):  
https://github.com/huggingface/diffusers/commit/cbb4c02e32ca74b7ac539857c91aa06bd4c9338c  
**EDIT:** rebased implementation to show how it would fit in with the existing attention masking and the refactored attention:  
https://github.com/Birch-san/diffusers/commit/e3a93e9d80a6b4e5122e5b9d02ad4ee60c7d1354

I explicitly named the parameter as a _cross_ attention mask, because a self-attention mask has entirely different requirements.

in terms of wider API design, I wonder whether it should be an attention _map_ (i.e. so you can use it to increase/decrease attention scores for certain token embeds). but for now I'm mostly interested in the mask aspect. because waifu-diffusion makes use of "multiple CLIP embeddings stitched together", so attention masking is useful to avoid attending to padding token embeddings, which would be biased towards conveying high-level semantic of the final CLIP segment only.

@patrickvonplaten

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does attention masking actually work? #1890

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does attention masking actually work? #1890

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions