-
Notifications
You must be signed in to change notification settings - Fork 2.7k
fix free_gpu_memory option for diffusers models [using accelerate cpu_offload]
#2542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fb9f76d to
802ad1d
Compare
|
There are a few things I don't love about this, but it's in the category of "things that will be a lot clearer to deal with when we aren't supporting two different model architectures," and it seems to work. |
lstein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No joy testing on Ubuntu CUDA system:
invokeai --free_gpu_mem --model stable-diffusion-1.5
* Initializing, be patient...
>> Initialization file /home/lstein/invokeai/invokeai.init found. Loading...
>> Internet connectivity is True
>> InvokeAI, version 2.3.0-rc4
>> InvokeAI runtime directory is "/home/lstein/invokeai"
>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Using device_type cuda
>> xformers memory-efficient attention is available and enabled
>> Current VRAM usage: 0.00G
>> Loading diffusers model from runwayml/stable-diffusion-v1-5
| Using faster float16 precision
| Loading diffusers VAE from stabilityai/sd-vae-ft-mse
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 80145.94it/s]
| Default image dimensions = 512 x 512
>> Model loaded in 6.80s
>> Max VRAM used to load the model: 2.16G
>> Current VRAM usage:0.00G
>> Textual inversions available:
>> Setting Sampler to k_lms (LMSDiscreteScheduler)
* Initialization done! Awaiting your command (-h for help, 'q' to quit)
(stable-diffusion-1.5) invoke> banana sushi
>> Patchmatch initialized
0%| | 0/30 [00:00<?, ?it/s]
Generating: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/lstein/Projects/InvokeAI/ldm/generate.py", line 516, in prompt2image
results = generator.generate(
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/base.py", line 112, in generate
image = make_image(x_T)
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/txt2img.py", line 40, in make_image
pipeline_output = pipeline.image_from_embeddings(
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 340, in image_from_embeddings
result_latents, result_attention_map_saver = self.latents_from_embeddings(
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 366, in latents_from_embeddings
result: PipelineIntermediateState = infer_latents_from_embeddings(
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 181, in __call__
for result in self.generator_method(*args, **kwargs):
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 403, in generate_latents_from_embeddings
step_output = self.step(batched_t, latents, conditioning_data,
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 438, in step
noise_pred = self.invokeai_diffuser.do_diffusion_step(
File "/home/lstein/Projects/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 160, in do_diffusion_step
unconditioned_next_x, conditioned_next_x = self.apply_standard_conditioning(x, sigma, unconditioning, conditioning)
File "/home/lstein/Projects/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 176, in apply_standard_conditioning
both_results = self.model_forward_callback(x_twice, sigma_twice, both_conditionings)
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 472, in _unet_forward
return self.unet(sample=latents,
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 481, in forward
sample, res_samples = downsample_block(
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 789, in forward
hidden_states = attn(
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/transformer_2d.py", line 265, in forward
hidden_states = block(
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/attention.py", line 307, in forward
attn_output = self.attn2(
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/cross_attention.py", line 160, in forward
return self.processor(
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/diffusers/models/cross_attention.py", line 367, in __call__
key = attn.to_k(encoder_hidden_states)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/home/lstein/invokeai/.venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half
>> Could not generate image.
|
Oops! Left a file out of the commit. Try this. |
|
The new commit fixes the crash and I see a tremendous decrease in memory usage. However, there is a performance penalty:
Is this expected? With ckpt files I don't see this performance dropoff when |
|
oh geez, can confirm. the difference isn't quite as extreme on my hardware, but it's still terrible. will have to look in to that |
|
I spent a while looking around trying to figure out if accelerate somehow added an autocast somewhere, but I didn't find any of that. New hypothesis: accelerate's cpu_offload hook is around the forward call for each model. So if you have a diffusion process that makes fifty separate forward calls, it's going to swap it in and out between each of those? 😖 That should be straightforward to test. If that's the case, the performance regression will show up in a minimal example of the stock StableDiffusionPipeline too. |
|
confirmed with minimal example and filed issue upstream huggingface/diffusers#2266 |
|
I think the thing to do is to scrap this PR, abandon using which is kind of a bummer because I took out all the |
That's so disappointing. I'm sorry you had to do all that work only to undo it again. |
free_gpu_memory option for diffusers modelsfree_gpu_memory option for diffusers models [using accelerate cpu_offload]
Connects the flag to the diffusers https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.enable_sequential_cpu_offload
Fixes #2326