[enhancement]: OOM error during VAE decode

### Is there an existing issue for this?

- [X] I have searched the existing issues

### OS

Linux

### GPU

cuda

### VRAM

24GB

### What happened?

OOM error during VAE decode. VRAM usage skyrockets from ~15.5GB to ~40GB during decode for a 3072x3072 image.

Can we mitigate this somehow?

I tried calling [self.enable-vae-slicing()](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.enable_vae_slicing) in the constructor for [StableDiffusionGeneratorPipeline](https://github.com/invoke-ai/InvokeAI/blob/main/ldm/invoke/generator/diffusers_pipeline.py#L270-L304) but the numbers stayed the same. 

```
>> Image Generation Parameters:

{'prompt': 'pizza', 'iterations': 3, 'steps': 3, 'cfg_scale': 7.5, 'threshold': 0, 'perlin': 0, 'height': 3072, 'width': 3072, 'sampler_name': 'k_lms', 'seed': 3471489041, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '...', 'hires_fix': False, 'seamless': False, 'variation_amount': 0}

>> ESRGAN Parameters: False
>> Facetool Parameters: False
100%|█████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.58s/it]
Generating:   0%|                                                     | 0/3 [00:19<?, ?it/s]
Traceback (most recent call last):
  File "/home/bat/Documents/Code/InvokeAI/ldm/generate.py", line 517, in prompt2image
    results = generator.generate(
  File "/home/bat/Documents/Code/InvokeAI/ldm/invoke/generator/base.py", line 112, in generate
    image = make_image(x_T)
  File "/home/bat/Documents/Code/InvokeAI/ldm/invoke/generator/txt2img.py", line 40, in make_image
    pipeline_output = pipeline.image_from_embeddings(
  File "/home/bat/Documents/Code/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 365, in image_from_embeddings
    image = self.decode_latents(result_latents)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 370, in decode_latents
    image = self.vae.decode(latents).sample
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 144, in decode
    decoded = self._decode(z).sample
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 116, in _decode
    dec = self.decoder(z)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/vae.py", line 188, in forward
    sample = up_block(sample)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 1718, in forward
    hidden_states = upsampler(hidden_states)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/resnet.py", line 139, in forward
    hidden_states = self.conv(hidden_states)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/bat/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 40.50 GiB (GPU 0; 23.65 GiB total capacity; 14.41 GiB already allocated; 5.59 GiB free; 15.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

>> Could not generate image.

>> Usage stats:
>>   0 image(s) generated in 19.64s
>>   Max VRAM used for this generation: 15.47G. Current VRAM utilization: 10.64G
>>   Max VRAM used since script start:  15.47G
```

### Screenshots

_No response_

### Additional context

_No response_

### Contact Details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[enhancement]: OOM error during VAE decode #2672

Is there an existing issue for this?

OS

GPU

VRAM

What happened?

Screenshots

Additional context

Contact Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[enhancement]: OOM error during VAE decode #2672

Description

Is there an existing issue for this?

OS

GPU

VRAM

What happened?

Screenshots

Additional context

Contact Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions