-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Describe the bug
I tried testing the potential speed updates of diffusers 0.4.0 on my M1 mac using an existing StableDiffusionPipeline-based script, and I found that a large image that would take ~3 min to generate in diffusers 0.3.0 was estimated to take more than 10x as long.
Since my existing script had a lot going on (e.g. large resolutions, attention slicing), I tried to diagnose the problem with a minimal script (see below), running in two identical environments, with the only difference being the diffusers version.
In diffusers 0.3.0, it takes ~35 seconds to generate a reasonable result like this:

In diffusers 0.4.0, it takes ~50 seconds (which is slower than 0.3.0, but better that the 10x performance hit I was getting before), but each attempt (with varying seeds) triggered the NSFW filter. Disabling the filter, the results appear to be just noise:

I'm not sure if the initial 10x performance hit I initially observed in my original script would be fixed by fixing this bug, but this certainly seems to be at least part of it.
Reproduction
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.to("mps")
result = pipe("dogs playing poker", generator=torch.manual_seed(1))
result.images[0].save("test.png")
Logs
Under 0.4.0 there's also this warning:
/opt/homebrew/Caskroom/miniforge/base/envs/sd/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py:222: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
text_embeddings = text_embeddings.repeat_interleave(num_images_per_prompt, dim=0)System Info
diffusersversion: 0.4.0- Platform: macOS-12.6-arm64-arm-64bit
- Python version: 3.10.6
- PyTorch version (GPU?): 1.13.0.dev20220911 (False)
- Huggingface_hub version: 0.10.0
- Transformers version: 4.21.3
- Using GPU in script?: MPS
- Using distributed or parallel set-up in script?: no