Skip to content

Conversation

@keturn
Copy link
Contributor

@keturn keturn commented Feb 9, 2023

Trying something different in place of #2542.

Fixes #2326.

@keturn
Copy link
Contributor Author

keturn commented Feb 9, 2023

The accelerate feature the previous PR used does this terribly clever thing where the parameters are defined on the meta device until their module's forward method is called.

That often resulted in foo.device returning meta, which is useless for trying to prepare tensors you want to pass to that model, but at least it's obviously useless upon inspection.

In contrast, having models shuffled off to cpu means that foo.device is never the unusable meta, but it does mean that the foo.device you see now isn't necessarily what foo.device will be when you foo(), so we still have to be cautious about relying on .device.

@keturn
Copy link
Contributor Author

keturn commented Feb 9, 2023

It's minimally working now, tested for txt2img in text mode only.

There are still a few other things to clean up, e.g. web runs in to an error in lowres_estimated_image.

@keturn
Copy link
Contributor Author

keturn commented Feb 9, 2023

added a workaround for the estimated image function, then moved on to testing inpainting and that has its own bucket of problems.

I assumed that vae.decode was a problem because its forward method was encode and so decode was the "extra" method, but no. Its forward does both for some reason, so neither encode nor decode get the hooks.

And then it's got the usual set of "tensors prepared using vae.device end up on cpu" problems, which we have otherwise but are multiplied because inpainting deals with more stuff (input image, mask, etc).

I'm coming to believe that we can't do this completely transparently like this, with the Pipeline class not aware of how the model devices are managed. I think it needs a way to discover what their actual execution device is, whether that's through inquiring with an Offloader instance or some property we add to the models.

@keturn
Copy link
Contributor Author

keturn commented Feb 10, 2023

This is starting to shape up now, and the usage of it doesn't require nearly so many awkward workarounds as the previous attempt.

It's working in many cases, but my troubleshooting is currently impeded by something that's swallowing all the console output.

@keturn keturn marked this pull request as ready for review February 15, 2023 04:30
@damian0815
Copy link
Contributor

damian0815 commented Feb 16, 2023

i hit resolve on both of those conversations, thanks for making the naming change

@damian0815 damian0815 self-requested a review February 16, 2023 20:02
@damian0815 damian0815 enabled auto-merge (squash) February 16, 2023 20:03
Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! Very elegant solution.

@damian0815 damian0815 merged commit 8a0d45a into main Feb 16, 2023
@damian0815 damian0815 deleted the spike/offloading-device branch February 16, 2023 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: free_gpu_mem does not offload diffusers models

4 participants