StableDiffusionPipeline producing unexpected output with MPS device using diffusers==0.4.0

### Describe the bug

I tried testing the potential speed updates of diffusers 0.4.0 on my M1 mac using an existing StableDiffusionPipeline-based script, and I found that a large image that would take ~3 min to generate in diffusers 0.3.0 was estimated to take more than 10x as long.

Since my existing script had a lot going on (e.g. large resolutions, attention slicing), I tried to diagnose the problem with a minimal script (see below), running in two identical environments, with the only difference being the diffusers version. 

In diffusers 0.3.0, it takes ~35 seconds to generate a reasonable result like this:
![test_0 3 0](https://user-images.githubusercontent.com/31020859/194449869-a3d8d0ef-0408-4fac-bb5c-5dc6813435de.png)

In diffusers 0.4.0, it takes ~50 seconds (which is slower than 0.3.0, but better that the 10x performance hit I was getting before), but each attempt (with varying seeds) triggered the NSFW filter. Disabling the filter, the results appear to be just noise:
![test_0 4 0](https://user-images.githubusercontent.com/31020859/194450034-4b4b67d8-8076-4391-9c00-2f8a3cce0aff.png)

I'm not sure if the initial 10x performance hit I initially observed in my original script would be fixed by fixing this bug, but this certainly seems to be at least part of it. 



### Reproduction

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.to("mps")

result = pipe("dogs playing poker", generator=torch.manual_seed(1))

result.images[0].save("test.png")

### Logs

```shell
Under 0.4.0 there's also this warning:

/opt/homebrew/Caskroom/miniforge/base/envs/sd/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py:222: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  text_embeddings = text_embeddings.repeat_interleave(num_images_per_prompt, dim=0)
```


### System Info

- `diffusers` version: 0.4.0
- Platform: macOS-12.6-arm64-arm-64bit
- Python version: 3.10.6
- PyTorch version (GPU?): 1.13.0.dev20220911 (False)
- Huggingface_hub version: 0.10.0
- Transformers version: 4.21.3
- Using GPU in script?: MPS
- Using distributed or parallel set-up in script?: no

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StableDiffusionPipeline producing unexpected output with MPS device using diffusers==0.4.0 #760

Describe the bug

Reproduction

Logs

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StableDiffusionPipeline producing unexpected output with MPS device using diffusers==0.4.0 #760

Description

Describe the bug

Reproduction

Logs

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions