Strategize slicing based on free [V]RAM #2572

JPPhoto · 2023-02-07T19:46:47Z

I had some spare time so I strategized slicing - none or max - at runtime based on the size of the generation requested and free [V]RAM at that time. This needs testers on multiple platforms.

Kyle0654 · 2023-02-07T20:19:23Z

I know we're nowhere near there, but do you have any ideas about how we'd make this work in a batched/parallel-generation environment?

JPPhoto · 2023-02-07T20:28:53Z

I know we're nowhere near there, but do you have any ideas about how we'd make this work in a batched/parallel-generation environment?

We're nowhere near there. :)

For a batched system, each batch gets to make the call as to whether there's available [V]RAM to run fast or max sliced. I don't think there are any code changes required for that case.

If we're on a parallel-generation system we'd have to have some awareness of the other concurrent generations on the system and add all of their memory requirements first before deciding how to run ours, or (the simple solution) every job on that type of system has to run with max to avoid OOM errors. I also imagine that sub-quadratic slicing will take care of a lot of these issues when it's implemented in Invoke or (better) diffusers.

JPPhoto · 2023-02-07T22:23:06Z

I also want to get some Windows + CUDA testers to run in a debugger and see if you're getting an accurate amount of free VRAM back from torch. If not, everything will still run - just in slices in all cases.

lstein · 2023-02-08T01:29:09Z

Thanks for doing this. I am looking forward to testing this feature after the 2.3.0 dust settles

psychedelicious · 2023-02-08T10:26:53Z

@JPPhoto I'm happy to test, but I'm not sure what I'm looking for... should I just see if the magic smoke escapes?

JPPhoto · 2023-02-08T12:14:28Z

@JPPhoto I'm happy to test, but I'm not sure what I'm looking for... should I just see if the magic smoke escapes?

That is definitely part of it. But I want to make sure you can render at sizes that are appropriate for your free [V]RAM without crashing with as much speed as possible. What's your setup?

On my 12GB NVIDIA card, tensors for a 512x512 image fit entirely in VRAM alongside the model. When I scale up, memory requirements grow - the rough formula is that you need 16 * ((x * y / 64) ^ 2) * 6 (or 8 if using fp32) bytes of [V]RAM. So a 1280x1280 image is possible on my card via slicing implemented by diffusers and shouldn't cause an OOM.

psychedelicious · 2023-02-08T12:30:02Z

I'm on an M1 MacBook with 32GB memory (shared RAM/VRAM). I don't think invoke can access my VRAM usage, so anything that makes decisions based on that probably will not work.

JPPhoto · 2023-02-08T12:34:04Z

I want to see if it works on that platform as well (and more importantly that I didn't break it), so please go ahead and give it a shot.

lstein

Confirmed working on Ubuntu.

ldm/invoke/generator/diffusers_pipeline.py

damian0815

looks good other than the naming the _enable_memory_efficient_attention function

…t_attention as this happens every generation.

* new OffloadingDevice loads one model at a time, on demand * fixup! new OffloadingDevice loads one model at a time, on demand * fix(prompt_to_embeddings): call the text encoder directly instead of its forward method allowing any associated hooks to run with it. * more attempts to get things on the right device from the offloader * more attempts to get things on the right device from the offloader * make offloading methods an explicit part of the pipeline interface * inlining some calls where device is only used once * ensure model group is ready after pipeline.to is called * fixup! Strategize slicing based on free [V]RAM (#2572) * doc(offloading): docstrings for offloading.ModelGroup * doc(offloading): docstrings for offloading-related pipeline methods * refactor(offloading): s/SimpleModelGroup/FullyLoadedModelGroup * refactor(offloading): s/HotSeatModelGroup/LazilyLoadedModelGroup to frame it is the same terms as "FullyLoadedModelGroup" --------- Co-authored-by: Damian Stewart <[email protected]>

Strategize slicing based on free [V]RAM

3042080

JPPhoto requested review from Kyle0654, damian0815, ebr, keturn, lstein, netsvetaev and psychedelicious February 7, 2023 19:46

Cleaned up variable names

78941bb

JPPhoto added 2 commits February 7, 2023 20:15

Merge branch 'main' into JPPhoto-choose-slicing-strategy

d3a3158

Merge branch 'main' into JPPhoto-choose-slicing-strategy

344ec0c

Merge branch 'main' into JPPhoto-choose-slicing-strategy

f75ca65

Aggregate better CUDA stats.

e81ef25

JPPhoto marked this pull request as draft February 8, 2023 14:37

JPPhoto added 7 commits February 8, 2023 08:37

Merge branch 'main' into JPPhoto-choose-slicing-strategy

ecb9b81

Merge branch 'main' into JPPhoto-choose-slicing-strategy

8338931

Merge branch 'main' into JPPhoto-choose-slicing-strategy

8243202

Merge branch 'main' into JPPhoto-choose-slicing-strategy

92b14bc

Merge branch 'main' into JPPhoto-choose-slicing-strategy

ebdd3eb

Merge branch 'main' into JPPhoto-choose-slicing-strategy

195d0a1

Merge branch 'main' into JPPhoto-choose-slicing-strategy

ae5b2b4

JPPhoto marked this pull request as ready for review February 10, 2023 16:00

Merge branch 'main' into JPPhoto-choose-slicing-strategy

ede8b95

lstein approved these changes Feb 11, 2023

View reviewed changes

JPPhoto added 3 commits February 11, 2023 07:10

Merge branch 'main' into JPPhoto-choose-slicing-strategy

ffeacc2

Merge branch 'main' into JPPhoto-choose-slicing-strategy

42ae262

Merge branch 'main' into JPPhoto-choose-slicing-strategy

3661b53

damian0815 reviewed Feb 12, 2023

View reviewed changes

ldm/invoke/generator/diffusers_pipeline.py Outdated Show resolved Hide resolved

damian0815 approved these changes Feb 12, 2023

View reviewed changes

Renamed _enable_memory_efficient_attention to _adjust_memory_efficien…

35641d1

…t_attention as this happens every generation.

JPPhoto enabled auto-merge (squash) February 12, 2023 18:10

JPPhoto merged commit 9eed191 into main Feb 12, 2023

JPPhoto deleted the JPPhoto-choose-slicing-strategy branch February 12, 2023 18:24

keturn added a commit that referenced this pull request Feb 15, 2023

fixup! Strategize slicing based on free [V]RAM (#2572)

26444af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strategize slicing based on free [V]RAM #2572

Strategize slicing based on free [V]RAM #2572

Uh oh!

JPPhoto commented Feb 7, 2023

Uh oh!

Kyle0654 commented Feb 7, 2023

Uh oh!

JPPhoto commented Feb 7, 2023

Uh oh!

JPPhoto commented Feb 7, 2023

Uh oh!

lstein commented Feb 8, 2023

Uh oh!

psychedelicious commented Feb 8, 2023

Uh oh!

JPPhoto commented Feb 8, 2023

Uh oh!

psychedelicious commented Feb 8, 2023

Uh oh!

JPPhoto commented Feb 8, 2023

Uh oh!

lstein left a comment

Uh oh!

Uh oh!

damian0815 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Strategize slicing based on free [V]RAM #2572

Strategize slicing based on free [V]RAM #2572

Uh oh!

Conversation

JPPhoto commented Feb 7, 2023

Uh oh!

Kyle0654 commented Feb 7, 2023

Uh oh!

JPPhoto commented Feb 7, 2023

Uh oh!

JPPhoto commented Feb 7, 2023

Uh oh!

lstein commented Feb 8, 2023

Uh oh!

psychedelicious commented Feb 8, 2023

Uh oh!

JPPhoto commented Feb 8, 2023

Uh oh!

psychedelicious commented Feb 8, 2023

Uh oh!

JPPhoto commented Feb 8, 2023

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

damian0815 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants