Skip to content

Conversation

@keturn
Copy link
Contributor

@keturn keturn commented Feb 6, 2023

@keturn keturn added the bug Something isn't working label Feb 6, 2023
@keturn keturn added this to the 2.3 🧨 milestone Feb 6, 2023
@keturn keturn marked this pull request as ready for review February 6, 2023 03:44
@keturn
Copy link
Contributor Author

keturn commented Feb 6, 2023

There are a few things I don't love about this, but it's in the category of "things that will be a lot clearer to deal with when we aren't supporting two different model architectures," and it seems to work.

Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No joy testing on Ubuntu CUDA system:

invokeai --free_gpu_mem --model stable-diffusion-1.5
* Initializing, be patient...
>> Initialization file /home/lstein/invokeai/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.0-rc4
>> InvokeAI runtime directory is "/home/lstein/invokeai"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type cuda
>> xformers memory-efficient attention is available and enabled
>> Current VRAM usage:  0.00G
>> Loading diffusers model from runwayml/stable-diffusion-v1-5
  | Using faster float16 precision
  | Loading diffusers VAE from stabilityai/sd-vae-ft-mse
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 80145.94it/s]
  | Default image dimensions = 512 x 512
>> Model loaded in 6.80s
>> Max VRAM used to load the model: 2.16G 
>> Current VRAM usage:0.00G
>> Textual inversions available: 
>> Setting Sampler to k_lms (LMSDiscreteScheduler)

* Initialization done! Awaiting your command (-h for help, 'q' to quit)
(stable-diffusion-1.5) invoke> banana sushi
>> Patchmatch initialized
  0%|                                                                                                        | 0/30 [00:00<?, ?it/s]
Generating:   0%|                                                                                             | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/lstein/Projects/InvokeAI/ldm/generate.py", line 516, in prompt2image
    results = generator.generate(
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/base.py", line 112, in generate
    image = make_image(x_T)
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/txt2img.py", line 40, in make_image
    pipeline_output = pipeline.image_from_embeddings(
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 340, in image_from_embeddings
    result_latents, result_attention_map_saver = self.latents_from_embeddings(
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 366, in latents_from_embeddings
    result: PipelineIntermediateState = infer_latents_from_embeddings(
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 181, in __call__
    for result in self.generator_method(*args, **kwargs):
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 403, in generate_latents_from_embeddings
    step_output = self.step(batched_t, latents, conditioning_data,
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 438, in step
    noise_pred = self.invokeai_diffuser.do_diffusion_step(
  File "/home/lstein/Projects/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 160, in do_diffusion_step
    unconditioned_next_x, conditioned_next_x = self.apply_standard_conditioning(x, sigma, unconditioning, conditioning)
  File "/home/lstein/Projects/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 176, in apply_standard_conditioning
    both_results = self.model_forward_callback(x_twice, sigma_twice, both_conditionings)
  File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 472, in _unet_forward
    return self.unet(sample=latents,
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 481, in forward
    sample, res_samples = downsample_block(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 789, in forward
    hidden_states = attn(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/transformer_2d.py", line 265, in forward
    hidden_states = block(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py", line 307, in forward
    attn_output = self.attn2(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/cross_attention.py", line 160, in forward
    return self.processor(
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/cross_attention.py", line 367, in __call__
    key = attn.to_k(encoder_hidden_states)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half

>> Could not generate image.

@keturn
Copy link
Contributor Author

keturn commented Feb 6, 2023

Oops! Left a file out of the commit. Try this.

@lstein
Copy link
Collaborator

lstein commented Feb 6, 2023

The new commit fixes the crash and I see a tremendous decrease in memory usage. However, there is a performance penalty:

argument Render Speed (it/s) VRAM (Gb)
<none> 14.78 2.81
--free_gpu_mem 3.32 0.64

Is this expected? With ckpt files I don't see this performance dropoff when --free_gpu_mem is activated.

@keturn
Copy link
Contributor Author

keturn commented Feb 6, 2023

oh geez, can confirm. the difference isn't quite as extreme on my hardware, but it's still terrible. will have to look in to that

@keturn
Copy link
Contributor Author

keturn commented Feb 7, 2023

I spent a while looking around trying to figure out if accelerate somehow added an autocast somewhere, but I didn't find any of that.

New hypothesis: accelerate's cpu_offload hook is around the forward call for each model. So if you have a diffusion process that makes fifty separate forward calls, it's going to swap it in and out between each of those? 😖

That should be straightforward to test. If that's the case, the performance regression will show up in a minimal example of the stock StableDiffusionPipeline too.

@keturn keturn marked this pull request as draft February 7, 2023 00:34
@keturn
Copy link
Contributor Author

keturn commented Feb 7, 2023

confirmed with minimal example and filed issue upstream huggingface/diffusers#2266

@keturn
Copy link
Contributor Author

keturn commented Feb 7, 2023

I think the thing to do is to scrap this PR, abandon using accelerate.cpu_offload, and go back to doing all the to_cpu / from_cpu stuff ourselves.

which is kind of a bummer because I took out all the free_gpu_mem clauses when implementing the diffusers versions of the generation methods because I thought that was going to be managed for us. 🙁

@lstein
Copy link
Collaborator

lstein commented Feb 7, 2023

I think the thing to do is to scrap this PR, abandon using accelerate.cpu_offload, and go back to doing all the to_cpu / from_cpu stuff ourselves.

which is kind of a bummer because I took out all the free_gpu_mem clauses when implementing the diffusers versions of the generation methods because I thought that was going to be managed for us. slightly_frowning_face

That's so disappointing. I'm sorry you had to do all that work only to undo it again.

@keturn keturn changed the title fix free_gpu_memory option for diffusers models fix free_gpu_memory option for diffusers models [using accelerate cpu_offload] Feb 9, 2023
@keturn keturn closed this Feb 9, 2023
@keturn keturn deleted the fix/cpu-offload branch February 19, 2023 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: free_gpu_mem does not offload diffusers models

3 participants