-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Fix cpu offloading #1177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cpu offloading #1177
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
src/diffusers/pipeline_utils.py
Outdated
| if isinstance(module, torch.nn.Module): | ||
| if module.device == torch.device("meta"): | ||
| return torch.device("cpu") | ||
| return torch.device("cuda" if torch.cuda.is_available() else "cpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten @piEsposito this feels hacky, but is required to make the pipelines work when self.device is not the same as e.g. generator after offloading.
See the error here: https://github.com/huggingface/diffusers/actions/runs/3410777950/jobs/5674151054#step:10:551
Seems that accelerate doesn't populate param_original_devices here, so the only way to know where the model was supposed to go is to guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand - and actually think this is a clever solution, I've learned a few thing with this PR of yours.
Also, IMO it is correct to assume that, is a user has a GPU, then they will use it instead of CPU for Diffusion models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the more I think about it, wouldn't the cleanest solution be just to return torch.device("meta") and then fix the bugs in the pipelines directly.
Bit worried about making such a fundamental function this hacky.
Also cc @pcuenca - curious to hear your thoughts!
| # make sure that less than 2.2 GB is allocated | ||
| assert mem_bytes < 2.2 * 10**9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piEsposito the 768x512 images require ~2.16GB of memory, compared to 1.5 for the 512x512 text2img tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thank you for catching that!
piEsposito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the approach, thank you for teaching me those few things.
| # make sure that less than 2.2 GB is allocated | ||
| assert mem_bytes < 2.2 * 10**9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thank you for catching that!
src/diffusers/pipeline_utils.py
Outdated
| if isinstance(module, torch.nn.Module): | ||
| if module.device == torch.device("meta"): | ||
| return torch.device("cpu") | ||
| return torch.device("cuda" if torch.cuda.is_available() else "cpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand - and actually think this is a clever solution, I've learned a few thing with this PR of yours.
Also, IMO it is correct to assume that, is a user has a GPU, then they will use it instead of CPU for Diffusion models.
|
@piEsposito thank you for contributing the offloading solution too! 🤗 |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
If we decide to return But @piEsposito (cc @sgugger) maybe there's still a way to access the intended execution device after cpu offloading? |
|
The execution device will be attached to the bottom-level module, not the top-level one. |
|
@anton-l cc @sgugger the execution device appears as If you try |
That seems to only work when |
| # make sure that less than 1.5 GB is allocated | ||
| assert mem_bytes < 1.5 * 10**9 | ||
| # make sure that less than 2.8 GB is allocated | ||
| assert mem_bytes < 2.8 * 10**9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this increase happened yet, if someone can check mem_bytes here on their machine that would be great :)
* Fix cpu offloading * get offloaded devices locally for SD pipelines
Fixes the implementation and tests introduced in #1085
Looks like the two
test_stable_diffusion_pipeline_with_sequential_cpu_offloadingtests weren't checked with a gpu originally, which resulted in device mismatch: https://github.com/huggingface/diffusers/actions/runs/3410777950/jobs/5674151054#step:10:551@piEsposito FYI: github actions for the GPU tests aren't launched for PRs, so for future PRs please check them locally too :)