AutoencoderKL encoder outputs NaN for large images

### Describe the bug

AutoEncoderKL encoder loaded from runwayml/stable-diffusion-v1-5 outputs NaN for large images. I observe this behavior for image sizes starting from around 1500x1500 with vae_tiling disabled. I tried with float32, float16, with and without xFormers. Is it an expected behavior ? 

I would have liked to use vae_tiling but it produces tiles artefacts as reported in https://github.com/huggingface/diffusers/pull/1441.

### Reproduction

```
from PIL import Image
import torch
import urllib.request
import os

from diffusers import StableDiffusionImg2ImgPipeline
from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img import preprocess


image_file = urllib.request.urlopen("https://upload.wikimedia.org/wikipedia/commons/3/32/A_photograph_of_an_astronaut_riding_a_horse_2022-08-28.png")
init_image = Image.open(image_file)

up_size = (2048, 2048)
upsampled_image = init_image.resize(up_size, Image.Resampling.BILINEAR)
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float32)
pipe.enable_xformers_memory_efficient_attention()
device = "cuda"

with torch.no_grad():
    vae = pipe.vae.to(device)
    vae.disable_tiling()
    preprocessed = preprocess(upsampled_image).to(torch.float32).to(device)
    latents = vae.encode(preprocessed).latent_dist.sample()
    print(latents)
```

### Logs

```shell
tensor([[[[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]],

         [[nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          ...,
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan],
          [nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0')
```


### System Info

- `diffusers` version: 0.14.0
- Platform: Linux-5.15.0-1028-aws-x86_64-with-glibc2.35
- Python version: 3.10.6
- PyTorch version (GPU?): 2.0.0+cu117 (True)
- Huggingface_hub version: 0.13.3
- Transformers version: 4.27.4
- Accelerate version: 0.15.0
- xFormers version: 0.0.18
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AutoencoderKL encoder outputs NaN for large images #3209

Describe the bug

Reproduction

Logs

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AutoencoderKL encoder outputs NaN for large images #3209

Description

Describe the bug

Reproduction

Logs

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions