-
Notifications
You must be signed in to change notification settings - Fork 6.5k
add accelerate to load models with smaller memory footprint #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
src/diffusers/configuration_utils.py
Outdated
| init_dict, unused_kwargs = cls.extract_init_dict(config_dict, **kwargs) | ||
|
|
||
| model = cls(**init_dict) | ||
| device_map = kwargs.pop("low_cpu_mem_usage", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would like to try to keep configuration_utils.py framework and component independent. Could we maybe try to set:
with accelerate.init_empty_weights():
model, unused_kwargs = cls.from_config(
config_path,
cache_dir=cache_dir,
return_unused_kwargs=True,
force_download=force_download,
resume_download=resume_download,
proxies=proxies,
local_files_only=local_files_only,
use_auth_token=use_auth_token,
revision=revision,
subfolder=subfolder,
device_map=device_map,
**kwargs,
)
in modeling_utils.py instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did it.
src/diffusers/modeling_utils.py
Outdated
|
|
||
| # Set model in evaluation mode to deactivate DropOut modules by default | ||
| model.eval() | ||
| if device_map is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible it would be very nice if all accelerate logic would only be added here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did it. Had to put the model creation and checkpoint loading after grabbing the weight and config files to avoid splitting the accelerate logic into two.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @piEsposito,
Sorry for replying only now :-/
Thanks a lot for the PR! It looks already really nice. One important thing that (if possible) would be good to change is only add functionality to modeling_utils.py and not configuration_utils.py because configuration_utils is also used for the schedulers.
Could you maybe give this a try?
…ove it from configuration utils
|
@patrickvonplaten I've addressed your comments and moved accelerate logic to modelling utils. Also, created some tests to ensure memory usage gets lower and results keep the same. Thank you for your time to carefully review this PR. |
src/diffusers/modeling_utils.py
Outdated
| import torch | ||
| from torch import Tensor, device | ||
|
|
||
| import accelerate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not make accelerate a hard dependency here . Could you wrap it into a:
if accelerate_is_available():
import accelerate
else:
raise ImportError("Please install accelerate via `pip install accelerate`")
below the if device_map == "auto" method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten I've just did it, thank you for the suggestion.
src/diffusers/modeling_utils.py
Outdated
| from huggingface_hub import hf_hub_download | ||
| from huggingface_hub.utils import EntryNotFoundError, RepositoryNotFoundError, RevisionNotFoundError | ||
| from requests import HTTPError | ||
| from transformers.utils import is_accelerate_available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do this either because transformers is not hard requirement 😅
I should have given you more details in my previous comments - very sorry that the feedback cycle takes so much time. Will try hard to reply faster here now.
In short, can you copy this code https://github.com/huggingface/transformers/blob/e5b7cff5fe65eac9e54ba88fa3935b3270db0207/src/transformers/utils/import_utils.py#L528 into https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/import_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thank you for clarifying that. I've just fixed it. And no problem. I'm sure you are very busy and am very thankful for the time you took to guide me through this PR.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to clean up the is_accelerate_available() comment and then it should be good to go :-)
|
@patrickvonplaten finished implementing all requested changes. Please let me know if anything else comes to your mind. |
|
This looks good to me! |
|
Hey folks, any update here? |
| from_auto_class = kwargs.pop("_from_auto", False) | ||
| torch_dtype = kwargs.pop("torch_dtype", None) | ||
| subfolder = kwargs.pop("subfolder", None) | ||
| device_map = kwargs.pop("device_map", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I oversaw this the first time.
@piEsposito could you also add some docstring here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. 3,4 lines under line 264
|
PR is good to merge for me! Played around with: #!/usr/bin/env python3
from diffusers import UNet2DConditionModel
model = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-3", device_map="auto", subfolder="unet")
import ipdb; ipdb.set_trace()on 1, >1, and no GPU machines and it works as expected. |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patil-suraj @anton-l would be nice if one of you could take a look :-)
|
@piEsposito in a follow-up PR it would be nice if you could implement this then as well to the more global: from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3", device_map="auto")functionality :-) Think this could have then very wide-spread adoption. At the moment 99% of users load models via the pipeline interface |
patil-suraj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot for working on this @piEsposito !
@patrickvonplaten great idea. I can start working on that today. Thanks! |
…ace#361) * add accelerate to load models with smaller memory footprint * remove low_cpu_mem_usage as it is reduntant * move accelerate init weights context to modelling utils * add test to ensure results are the same when loading with accelerate * add tests to ensure ram usage gets lower when using accelerate * move accelerate logic to single snippet under modelling utils and remove it from configuration utils * format code using to pass quality check * fix imports with isor * add accelerate to test extra deps * only import accelerate if device_map is set to auto * move accelerate availability check to diffusers import utils * format code Co-authored-by: Patrick von Platen <[email protected]>
Usage: with create_context() as ctx: module = model_annotation(ctx, input_contents=..., config_path=..., search_op=...) Example: The example is to annotate the minilm model with GPU config files. python model_annotation.py /nodclouddata/vivian/minilm_model/model.mlir /nodclouddata/vivian/minilm_model/model_config.json
…ace#361) * add accelerate to load models with smaller memory footprint * remove low_cpu_mem_usage as it is reduntant * move accelerate init weights context to modelling utils * add test to ensure results are the same when loading with accelerate * add tests to ensure ram usage gets lower when using accelerate * move accelerate logic to single snippet under modelling utils and remove it from configuration utils * format code using to pass quality check * fix imports with isor * add accelerate to test extra deps * only import accelerate if device_map is set to auto * move accelerate availability check to diffusers import utils * format code Co-authored-by: Patrick von Platen <[email protected]>
Closes #281