Skip to content

[diffusers]: Model Cache #1777

@keturn

Description

@keturn

Model loading is significantly different with diffusers, and I'm not sure how best to integrate it with the existing ModelCache:

class ModelCache(object):
def __init__(self, config:OmegaConf, device_type:str, precision:str, max_loaded_models=DEFAULT_MAX_MODELS):
'''
Initialize with the path to the models.yaml config file,
the torch device type, and precision. The optional
min_avail_mem argument specifies how much unused system
(CPU) memory to preserve. The cache of models in RAM will
grow until this value is approached. Default is 2G.
'''

diffusers takes advantage of 🤗accelerate by default. I don't know much about that library, but it seems like the sort of "offload this model state to CPU until we need it again" stuff ModelCache is doing is already implemented there: Dispatching and Offloading Models. I hope this means we can drop a lot of the existing code from ModelCache.

I haven't been using ModelCache's "offload to cpu" functionality even on the main branch because it always took way more memory than I expected it to and quickly summoned the Out-Of-Memory Killer.

I do have fast storage and I don't have a ton of spare RAM, so I don't think I'm the target audience for the model caching/offloading feature and I need to delegate the ModelCache/diffusers integration to someone who properly appreciates it.

Some of this potential integration with accelerate could probably even be done on the main branch. But due to the fact that #1583 already changes the ModelCache file a fair bit, and the fact that the way we interact with accelerate is probably a little different with and without diffusers (because it already sets it up somewhat), I expect a PR for this should target dev/diffusers and not main.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions