fix `free_gpu_memory` option for diffusers models [using accelerate cpu_offload] #2542

keturn · 2023-02-06T03:17:09Z

Connects the flag to the diffusers https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.enable_sequential_cpu_offload

Fixes #2326

keturn · 2023-02-06T03:46:53Z

There are a few things I don't love about this, but it's in the category of "things that will be a lot clearer to deal with when we aren't supporting two different model architectures," and it seems to work.

lstein

No joy testing on Ubuntu CUDA system:

invokeai --free_gpu_mem --model stable-diffusion-1.5
* Initializing, be patient...
>> Initialization file /home/lstein/invokeai/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.0-rc4
>> InvokeAI runtime directory is "/home/lstein/invokeai"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type cuda
>> xformers memory-efficient attention is available and enabled
>> Current VRAM usage:  0.00G
>> Loading diffusers model from runwayml/stable-diffusion-v1-5
  | Using faster float16 precision
  | Loading diffusers VAE from stabilityai/sd-vae-ft-mse
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 80145.94it/s]
  | Default image dimensions = 512 x 512
>> Model loaded in 6.80s
>> Max VRAM used to load the model: 2.16G 
>> Current VRAM usage:0.00G
>> Textual inversions available: 
>> Setting Sampler to k_lms (LMSDiscreteScheduler)

* Initialization done! Awaiting your command (-h for help, 'q' to quit)
(stable-diffusion-1.5) invoke> banana sushi
>> Patchmatch initialized
  0%|                                                                                                        | 0/30 [00:00<?, ?it/s]
Generating:   0%|                                                                                             | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/lstein/Projects/InvokeAI/ldm/generate.py", line 516, in prompt2image
    results = generator.generate(
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/base.py", line 112, in generate
    image = make_image(x_T)
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/txt2img.py", line 40, in make_image
    pipeline_output = pipeline.image_from_embeddings(
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 340, in image_from_embeddings
    result_latents, result_attention_map_saver = self.latents_from_embeddings(
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 366, in latents_from_embeddings
    result: PipelineIntermediateState = infer_latents_from_embeddings(
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 181, in __call__
    for result in self.generator_method(*args, **kwargs):
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 403, in generate_latents_from_embeddings
    step_output = self.step(batched_t, latents, conditioning_data,
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 438, in step
    noise_pred = self.invokeai_diffuser.do_diffusion_step(
  File "/home/lstein/Projects/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 160, in do_diffusion_step
    unconditioned_next_x, conditioned_next_x = self.apply_standard_conditioning(x, sigma, unconditioning, conditioning)
  File "/home/lstein/Projects/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 176, in apply_standard_conditioning
    both_results = self.model_forward_callback(x_twice, sigma_twice, both_conditionings)
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 472, in _unet_forward
    return self.unet(sample=latents,
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 481, in forward
    sample, res_samples = downsample_block(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 789, in forward
    hidden_states = attn(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/transformer_2d.py", line 265, in forward
    hidden_states = block(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py", line 307, in forward
    attn_output = self.attn2(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/cross_attention.py", line 160, in forward
    return self.processor(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/cross_attention.py", line 367, in __call__
    key = attn.to_k(encoder_hidden_states)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half

>> Could not generate image.

keturn · 2023-02-06T04:33:26Z

Oops! Left a file out of the commit. Try this.

lstein · 2023-02-06T15:23:31Z

The new commit fixes the crash and I see a tremendous decrease in memory usage. However, there is a performance penalty:

argument	Render Speed (it/s)	VRAM (Gb)
<none>	14.78	2.81
`--free_gpu_mem`	3.32	0.64

Is this expected? With ckpt files I don't see this performance dropoff when --free_gpu_mem is activated.

keturn · 2023-02-06T18:44:50Z

oh geez, can confirm. the difference isn't quite as extreme on my hardware, but it's still terrible. will have to look in to that

keturn · 2023-02-07T00:28:43Z

I spent a while looking around trying to figure out if accelerate somehow added an autocast somewhere, but I didn't find any of that.

New hypothesis: accelerate's cpu_offload hook is around the forward call for each model. So if you have a diffusion process that makes fifty separate forward calls, it's going to swap it in and out between each of those? 😖

That should be straightforward to test. If that's the case, the performance regression will show up in a minimal example of the stock StableDiffusionPipeline too.

keturn · 2023-02-07T03:43:48Z

confirmed with minimal example and filed issue upstream huggingface/diffusers#2266

keturn · 2023-02-07T03:57:16Z

I think the thing to do is to scrap this PR, abandon using accelerate.cpu_offload, and go back to doing all the to_cpu / from_cpu stuff ourselves.

which is kind of a bummer because I took out all the free_gpu_mem clauses when implementing the diffusers versions of the generation methods because I thought that was going to be managed for us. 🙁

lstein · 2023-02-07T06:01:22Z

I think the thing to do is to scrap this PR, abandon using accelerate.cpu_offload, and go back to doing all the to_cpu / from_cpu stuff ourselves.

which is kind of a bummer because I took out all the free_gpu_mem clauses when implementing the diffusers versions of the generation methods because I thought that was going to be managed for us. slightly_frowning_face

That's so disappointing. I'm sorry you had to do all that work only to undo it again.

lint: 🚮 typos and unused imports and type hints

52f83d4

keturn added the bug Something isn't working label Feb 6, 2023

keturn added this to the 2.3 🧨 milestone Feb 6, 2023

fix: enable free_gpu_mem for diffusers

802ad1d

keturn force-pushed the fix/cpu-offload branch from fb9f76d to 802ad1d Compare February 6, 2023 03:18

keturn marked this pull request as ready for review February 6, 2023 03:44

lstein requested changes Feb 6, 2023

View reviewed changes

fix: fix device used by prompt-to-embeddings

ea88b10

keturn marked this pull request as draft February 7, 2023 00:34

keturn changed the title ~~fix free_gpu_memory option for diffusers models~~ fix free_gpu_memory option for diffusers models [using accelerate cpu_offload] Feb 9, 2023

keturn closed this Feb 9, 2023

This was referenced Feb 9, 2023

new OffloadingDevice loads one model at a time, on demand #2596

Merged

[bug]: free_gpu_mem does not offload diffusers models #2326

Closed

[diffusers]: Model Cache #1777

Closed

keturn deleted the fix/cpu-offload branch February 19, 2023 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix `free_gpu_memory` option for diffusers models [using accelerate cpu_offload] #2542

fix `free_gpu_memory` option for diffusers models [using accelerate cpu_offload] #2542

Uh oh!

keturn commented Feb 6, 2023

Uh oh!

keturn commented Feb 6, 2023

Uh oh!

lstein left a comment

Uh oh!

keturn commented Feb 6, 2023

Uh oh!

lstein commented Feb 6, 2023 •

edited

Loading

Uh oh!

keturn commented Feb 6, 2023

Uh oh!

keturn commented Feb 7, 2023 •

edited

Loading

Uh oh!

keturn commented Feb 7, 2023

Uh oh!

keturn commented Feb 7, 2023

Uh oh!

lstein commented Feb 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix free_gpu_memory option for diffusers models [using accelerate cpu_offload] #2542

fix free_gpu_memory option for diffusers models [using accelerate cpu_offload] #2542

Uh oh!

Conversation

keturn commented Feb 6, 2023

Uh oh!

keturn commented Feb 6, 2023

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Uh oh!

keturn commented Feb 6, 2023

Uh oh!

lstein commented Feb 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keturn commented Feb 6, 2023

Uh oh!

keturn commented Feb 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keturn commented Feb 7, 2023

Uh oh!

keturn commented Feb 7, 2023

Uh oh!

lstein commented Feb 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix `free_gpu_memory` option for diffusers models [using accelerate cpu_offload] #2542

fix `free_gpu_memory` option for diffusers models [using accelerate cpu_offload] #2542

lstein commented Feb 6, 2023 •

edited

Loading

keturn commented Feb 7, 2023 •

edited

Loading