Skip to content

Conversation

@piEsposito
Copy link
Contributor

@piEsposito piEsposito commented Sep 5, 2022

Closes #281

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 5, 2022

The documentation is not available anymore as the PR was closed or merged.

init_dict, unused_kwargs = cls.extract_init_dict(config_dict, **kwargs)

model = cls(**init_dict)
device_map = kwargs.pop("low_cpu_mem_usage", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would like to try to keep configuration_utils.py framework and component independent. Could we maybe try to set:

with accelerate.init_empty_weights():
            model, unused_kwargs = cls.from_config(
            config_path,
            cache_dir=cache_dir,
            return_unused_kwargs=True,
            force_download=force_download,
            resume_download=resume_download,
            proxies=proxies,
            local_files_only=local_files_only,
            use_auth_token=use_auth_token,
            revision=revision,
            subfolder=subfolder,
            device_map=device_map,
            **kwargs,
        )

in modeling_utils.py instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did it.


# Set model in evaluation mode to deactivate DropOut modules by default
model.eval()
if device_map is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible it would be very nice if all accelerate logic would only be added here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did it. Had to put the model creation and checkpoint loading after grabbing the weight and config files to avoid splitting the accelerate logic into two.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @piEsposito,

Sorry for replying only now :-/
Thanks a lot for the PR! It looks already really nice. One important thing that (if possible) would be good to change is only add functionality to modeling_utils.py and not configuration_utils.py because configuration_utils is also used for the schedulers.

Could you maybe give this a try?

@piEsposito
Copy link
Contributor Author

piEsposito commented Sep 16, 2022

@patrickvonplaten I've addressed your comments and moved accelerate logic to modelling utils. Also, created some tests to ensure memory usage gets lower and results keep the same. Thank you for your time to carefully review this PR.

@piEsposito piEsposito marked this pull request as ready for review September 16, 2022 19:25
import torch
from torch import Tensor, device

import accelerate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not make accelerate a hard dependency here . Could you wrap it into a:

if accelerate_is_available():
    import accelerate
else:
    raise ImportError("Please install accelerate via `pip install accelerate`")

below the if device_map == "auto" method?

Copy link
Contributor Author

@piEsposito piEsposito Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten I've just did it, thank you for the suggestion.

from huggingface_hub import hf_hub_download
from huggingface_hub.utils import EntryNotFoundError, RepositoryNotFoundError, RevisionNotFoundError
from requests import HTTPError
from transformers.utils import is_accelerate_available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do this either because transformers is not hard requirement 😅

I should have given you more details in my previous comments - very sorry that the feedback cycle takes so much time. Will try hard to reply faster here now.

In short, can you copy this code https://github.com/huggingface/transformers/blob/e5b7cff5fe65eac9e54ba88fa3935b3270db0207/src/transformers/utils/import_utils.py#L528 into https://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/import_utils.py

Copy link
Contributor Author

@piEsposito piEsposito Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thank you for clarifying that. I've just fixed it. And no problem. I'm sure you are very busy and am very thankful for the time you took to guide me through this PR.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to clean up the is_accelerate_available() comment and then it should be good to go :-)

@piEsposito
Copy link
Contributor Author

@patrickvonplaten finished implementing all requested changes. Please let me know if anything else comes to your mind.

@patrickvonplaten
Copy link
Contributor

This looks good to me!
@patil-suraj @anton-l could you take a look here?

@piEsposito
Copy link
Contributor Author

Hey folks, any update here?

from_auto_class = kwargs.pop("_from_auto", False)
torch_dtype = kwargs.pop("torch_dtype", None)
subfolder = kwargs.pop("subfolder", None)
device_map = kwargs.pop("device_map", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I oversaw this the first time.
@piEsposito could you also add some docstring here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. 3,4 lines under line 264

@patrickvonplaten
Copy link
Contributor

PR is good to merge for me! Played around with:

#!/usr/bin/env python3
from diffusers import UNet2DConditionModel

model = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-3", device_map="auto", subfolder="unet")
import ipdb; ipdb.set_trace()

on 1, >1, and no GPU machines and it works as expected.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patil-suraj @anton-l would be nice if one of you could take a look :-)

@patrickvonplaten
Copy link
Contributor

@piEsposito in a follow-up PR it would be nice if you could implement this then as well to the more global:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3", device_map="auto")

functionality :-) Think this could have then very wide-spread adoption. At the moment 99% of users load models via the pipeline interface

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for working on this @piEsposito !

@patrickvonplaten patrickvonplaten merged commit 4d1cce2 into huggingface:main Oct 4, 2022
@piEsposito
Copy link
Contributor Author

@piEsposito in a follow-up PR it would be nice if you could implement this then as well to the more global:

from diffusers import DiffusionPipeline



pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3", device_map="auto")

functionality :-) Think this could have then very wide-spread adoption. At the moment 99% of users load models via the pipeline interface

@patrickvonplaten great idea. I can start working on that today. Thanks!

prathikr pushed a commit to prathikr/diffusers that referenced this pull request Oct 26, 2022
…ace#361)

* add accelerate to load models with smaller memory footprint

* remove low_cpu_mem_usage as it is reduntant

* move accelerate init weights context to modelling utils

* add test to ensure results are the same when loading with accelerate

* add tests to ensure ram usage gets lower when using accelerate

* move accelerate logic to single snippet under modelling utils and remove it from configuration utils

* format code using to pass quality check

* fix imports with isor

* add accelerate to test extra deps

* only import accelerate if device_map is set to auto

* move accelerate availability check to diffusers import utils

* format code

Co-authored-by: Patrick von Platen <[email protected]>
PhaneeshB pushed a commit to nod-ai/diffusers that referenced this pull request Mar 1, 2023
Usage:
with create_context() as ctx:
  module = model_annotation(ctx, input_contents=..., config_path=..., search_op=...)

Example:
The example is to annotate the minilm model with GPU config files.
python model_annotation.py /nodclouddata/vivian/minilm_model/model.mlir /nodclouddata/vivian/minilm_model/model_config.json
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…ace#361)

* add accelerate to load models with smaller memory footprint

* remove low_cpu_mem_usage as it is reduntant

* move accelerate init weights context to modelling utils

* add test to ensure results are the same when loading with accelerate

* add tests to ensure ram usage gets lower when using accelerate

* move accelerate logic to single snippet under modelling utils and remove it from configuration utils

* format code using to pass quality check

* fix imports with isor

* add accelerate to test extra deps

* only import accelerate if device_map is set to auto

* move accelerate availability check to diffusers import utils

* format code

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize model loading by natively using Accelerate, please.

4 participants