Fix cpu offloading #1177

anton-l · 2022-11-07T16:43:40Z

Fixes the implementation and tests introduced in #1085

Looks like the two test_stable_diffusion_pipeline_with_sequential_cpu_offloading tests weren't checked with a gpu originally, which resulted in device mismatch: https://github.com/huggingface/diffusers/actions/runs/3410777950/jobs/5674151054#step:10:551

@piEsposito FYI: github actions for the GPU tests aren't launched for PRs, so for future PRs please check them locally too :)

HuggingFaceDocBuilderDev · 2022-11-07T16:47:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

anton-l · 2022-11-07T16:47:47Z

src/diffusers/pipeline_utils.py

            if isinstance(module, torch.nn.Module):
                if module.device == torch.device("meta"):
-                    return torch.device("cpu")
+                    return torch.device("cuda" if torch.cuda.is_available() else "cpu")


@patrickvonplaten @piEsposito this feels hacky, but is required to make the pipelines work when self.device is not the same as e.g. generator after offloading.
See the error here: https://github.com/huggingface/diffusers/actions/runs/3410777950/jobs/5674151054#step:10:551

Seems that accelerate doesn't populate param_original_devices here, so the only way to know where the model was supposed to go is to guess?

I understand - and actually think this is a clever solution, I've learned a few thing with this PR of yours.

Also, IMO it is correct to assume that, is a user has a GPU, then they will use it instead of CPU for Diffusion models.

Actually the more I think about it, wouldn't the cleanest solution be just to return torch.device("meta") and then fix the bugs in the pipelines directly.

Bit worried about making such a fundamental function this hacky.

Also cc @pcuenca - curious to hear your thoughts!

anton-l · 2022-11-07T16:50:55Z

tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py

+        # make sure that less than 2.2 GB is allocated
+        assert mem_bytes < 2.2 * 10**9


@piEsposito the 768x512 images require ~2.16GB of memory, compared to 1.5 for the 512x512 text2img tests.

Yeah, thank you for catching that!

piEsposito

I agree with the approach, thank you for teaching me those few things.

piEsposito · 2022-11-07T18:04:34Z

tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py

+        # make sure that less than 2.2 GB is allocated
+        assert mem_bytes < 2.2 * 10**9


Yeah, thank you for catching that!

piEsposito · 2022-11-07T18:05:46Z

src/diffusers/pipeline_utils.py

            if isinstance(module, torch.nn.Module):
                if module.device == torch.device("meta"):
-                    return torch.device("cpu")
+                    return torch.device("cuda" if torch.cuda.is_available() else "cpu")


I understand - and actually think this is a clever solution, I've learned a few thing with this PR of yours.

Also, IMO it is correct to assume that, is a user has a GPU, then they will use it instead of CPU for Diffusion models.

anton-l · 2022-11-07T18:23:00Z

@piEsposito thank you for contributing the offloading solution too! 🤗

patrickvonplaten

Thanks a mille for fixing this @anton-l !
The tests fixes look great - I'm not sure we want to make the self.device function so hacky though - left a comment! Wdyt? Also cc @pcuenca

anton-l · 2022-11-08T16:55:50Z

If we decide to return "meta" from self.device then we'll have to replace every .to(self.device) in the pipeline to .to(self.execution_device) (where execution_device is saved before doing the cpu offload). Because once we offload the models we can no longer access the original device name inside pipeline.__call__, it's all just "meta" (the device remapping happens inside the magic module hooks that accelerate assigns).

But @piEsposito (cc @sgugger) maybe there's still a way to access the intended execution device after cpu offloading? self.unet._hf_hook.execution_device is None when I inspect it from the pipeline

sgugger · 2022-11-08T17:06:04Z

The execution device will be attached to the bottom-level module, not the top-level one.

piEsposito · 2022-11-08T18:16:39Z

@anton-l cc @sgugger the execution device appears as None because it is using a SequentialHook. If you get into this hook and try finding the align_execution_device_hook you should get what you are looking for.

If you try pipe.unet._hf_hook.hooks[0].execution_device it should return an integer corresponding to the GPU index, assuming you have accelerate installed from master.

anton-l · 2022-11-08T22:05:29Z

If you try pipe.unet._hf_hook.hooks[0].execution_device it should return an integer corresponding to the GPU index

That seems to only work when device_map="auto", so I've added a deeper check over submodules.

anton-l · 2022-11-08T22:11:38Z

tests/pipelines/stable_diffusion/test_stable_diffusion.py

-        # make sure that less than 1.5 GB is allocated
-        assert mem_bytes < 1.5 * 10**9
+        # make sure that less than 2.8 GB is allocated
+        assert mem_bytes < 2.8 * 10**9


Not sure how this increase happened yet, if someone can check mem_bytes here on their machine that would be great :)

patrickvonplaten · 2022-11-09T09:28:06Z

Thanks a lot for the help here @sgugger - the PR looks good to me. Merging this as it's blocking some other PRs.
@anton-l - I actually think we simply didn't run the text encoder on GPU at all before which is why the GPU memory went up. Also related to #1047

* Fix cpu offloading * get offloaded devices locally for SD pipelines

anton-l requested a review from patrickvonplaten November 7, 2022 16:43

Fix cpu offloading

d09911f

anton-l commented Nov 7, 2022

View reviewed changes

piEsposito approved these changes Nov 7, 2022

View reviewed changes

patrickvonplaten reviewed Nov 8, 2022

View reviewed changes

get offloaded devices locally for SD pipelines

497cb5b

merge main

d826ea9

anton-l commented Nov 8, 2022

View reviewed changes

patrickvonplaten merged commit 24895a1 into main Nov 9, 2022

patrickvonplaten deleted the fix-cpu-offload branch November 9, 2022 09:28

patrickvonplaten mentioned this pull request Nov 9, 2022

cpu_offload #1047

Closed

skirsten mentioned this pull request Nov 16, 2022

[Flax] Fix loading scheduler from subfolder #1319

Merged

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023

Fix cpu offloading (huggingface#1177)

dda2ad0

* Fix cpu offloading * get offloaded devices locally for SD pipelines

		# make sure that less than 2.2 GB is allocated
		assert mem_bytes < 2.2 * 10**9

Fix cpu offloading #1177

Fix cpu offloading #1177

Uh oh!

Conversation

anton-l commented Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 7, 2022

Uh oh!

anton-l Nov 7, 2022

Choose a reason for hiding this comment

Uh oh!

piEsposito Nov 7, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

anton-l Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piEsposito Nov 7, 2022

Choose a reason for hiding this comment

Uh oh!

piEsposito left a comment

Choose a reason for hiding this comment

Uh oh!

piEsposito Nov 7, 2022

Choose a reason for hiding this comment

Uh oh!

piEsposito Nov 7, 2022

Choose a reason for hiding this comment

Uh oh!

anton-l commented Nov 7, 2022

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

anton-l commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Nov 8, 2022

Uh oh!

piEsposito commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anton-l commented Nov 8, 2022

Uh oh!

anton-l Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Nov 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

anton-l commented Nov 7, 2022 •

edited

Loading

anton-l Nov 7, 2022 •

edited

Loading

anton-l commented Nov 8, 2022 •

edited

Loading

piEsposito commented Nov 8, 2022 •

edited

Loading