Skip to content

Conversation

@NouamaneTazi
Copy link
Member

Fixes slow tests after the PR #371

- to fix `test_stable_diffusion_memory_chunking` test
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 30, 2022

The documentation is not available anymore as the PR was closed or merged.

@patrickvonplaten patrickvonplaten merged commit b2cfc7a into huggingface:main Sep 30, 2022
david-j-smith added a commit to david-j-smith/diffusers that referenced this pull request Oct 4, 2022
prathikr pushed a commit to prathikr/diffusers that referenced this pull request Oct 26, 2022
* revert using baddbmm in attention
- to fix `test_stable_diffusion_memory_chunking` test

* styling
@Birch-san
Copy link
Contributor

Birch-san commented Nov 6, 2022

@NouamaneTazi @patrickvonplaten, do you remember any details about why this needed to be reverted?

by my measurements (admittedly on CompVis): reintroducing baddbmm would make inference 19% faster on Mac/MPS:
https://twitter.com/Birchlabs/status/1588703520433541120

was it the case that tests/test_pipelines.py::PipelineTesterMixin::test_stable_diffusion_memory_chunking went slower? did it do so on all platforms, or just on certain hardware (e.g. CUDA vs MPS, high vs low VRAM)? or this is only a problem when sliced attention is used, and we should prefer to use baddbmm for unsliced attention?

@patrickvonplaten
Copy link
Contributor

Hey @Birch-san,

Good question! Now that we have strong unet testing we could maybe re-open this PR to see if it improves overall performance.
However we need to be sure that it doesn't slow down any other use case such as ONNX, FP16 Torch, ...

What are your thoughts here @anton-l @pcuenca ?

@patrickvonplaten
Copy link
Contributor

And @NouamaneTazi ?

@Birch-san
Copy link
Contributor

Ah, lemme correct myself. the 19% speed increase was the outcome of 2 changes:

  • changing the (q,k) matmul to baddbmm (as shown in this PR)
  • changing the (attn, v) matmul below to a bmm(). not featured in this PR. the batched matmul was way faster than regular matmul on MPS.

@patil-suraj
Copy link
Contributor

Think we could open a PR and do through tests for other cases as well. @NouamaneTazi would you like to tackle this ?

@pcuenca
Copy link
Member

pcuenca commented Nov 8, 2022

I agree we should open a new PR and explore this again :)

@Birch-san
Copy link
Contributor

@patrickvonplaten @pcuenca @patil-suraj @NouamaneTazi
I've restored the original baddbmm, added a new bmm, implemented for sliced attention, added fast-path to Decoder:
#1203

PhaneeshB pushed a commit to nod-ai/diffusers that referenced this pull request Mar 1, 2023
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* revert using baddbmm in attention
- to fix `test_stable_diffusion_memory_chunking` test

* styling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants