Skip to content

Conversation

@blefaudeux
Copy link
Contributor

  • move the enable/disable call to being part of the base DiffusionPipeline (removes a bunch of duplicates)
  • make the call recursive across all the modules in the model graph, so that exposing set_use_memory_efficient_attention_xformers in a leaf module is all it takes for it to be picked up (important for some pipelines, like superres, which are not properly covered right now - see for instance simplyfy AttentionBlock #1492 )

cc @patrickvonplaten, discussed a couple of days ago. Note that there does not seem to be unit tests covering this part, unless I missed them

self.attn2._slice_size = slice_size

def _set_use_memory_efficient_attention_xformers(self, use_memory_efficient_attention_xformers: bool):
def set_use_memory_efficient_attention_xformers(self, use_memory_efficient_attention_xformers: bool):
Copy link
Contributor Author

@blefaudeux blefaudeux Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

called from the outside so can be public ? Plus conveys the idea that it's a capability being exposed

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 30, 2022

The documentation is not available anymore as the PR was closed or merged.

feature_extractor=feature_extractor,
)

def enable_xformers_memory_efficient_attention(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and below: inherits from DiffusionPipeline so I figured that this could be defined there (with the recursive take) to remove a lot of code duplication

if hasattr(block, "attentions") and block.attentions is not None:
block.set_attention_slice(slice_size)

def set_use_memory_efficient_attention_xformers(self, use_memory_efficient_attention_xformers: bool):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these are just trampolines, not needed with the recursive call from the top. An issue with these trampolines is that they're bound to miss some cases (they do) since they would have to be changed any time a new capability is exposed somewhere in the pipeline

"""
self.set_use_memory_efficient_attention_xformers(False)

def set_use_memory_efficient_attention_xformers(self, valid: bool) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the actual single implementation on how to enable mem-efficient attention across the whole model, for all pipelines (covers superres, outpainting or text2img, which mobilize attention in different places at times)

def set_progress_bar_config(self, **kwargs):
self._progress_bar_config = kwargs

def enable_xformers_memory_efficient_attention(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this enable and disable shorthands are just there because many derived pipelines were using that, so I figured that it was cheaper to expose the call here :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me to make a method of DiffusionPipeline !

@blefaudeux
Copy link
Contributor Author

open for feedback, this is a suggestion of course @patrickvonplaten @kashif

@blefaudeux
Copy link
Contributor Author

I tested

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

with torch.inference_mode():
    sample = pipe("a small cat")

sample[0][0].save("cat.png")

which works fine with this PR

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten
Copy link
Contributor

PR looks very nice to me! Given that xformers can essentially be used with every attention layer and every unet pretty much has an attention layer and every pipeline has at least one unet, I think it's a good idea to make it a "global" method by adding it to DiffusionPipeline - what do the others think here?

@blefaudeux
Copy link
Contributor Author

if you check a PR like this one, the changes here make it a lot easier and would remove 2/3rd of the lines of code

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool PR, makes it so much cleaner now ! And agree with you, keeping it in DiffusionPipeline makes sense!

Thanks a lot for working on this!

@patil-suraj patil-suraj merged commit a816a87 into huggingface:main Dec 2, 2022
tcapelle pushed a commit to tcapelle/diffusers that referenced this pull request Dec 12, 2022
…ursive (huggingface#1493)

* Moving the mem efficiient attention activation to the top + recursive

* black, too bad there's no pre-commit ?

Co-authored-by: Benjamin Lefaudeux <[email protected]>
sliard pushed a commit to sliard/diffusers that referenced this pull request Dec 21, 2022
…ursive (huggingface#1493)

* Moving the mem efficiient attention activation to the top + recursive

* black, too bad there's no pre-commit ?

Co-authored-by: Benjamin Lefaudeux <[email protected]>
Hajime76

This comment was marked as spam.

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…ursive (huggingface#1493)

* Moving the mem efficiient attention activation to the top + recursive

* black, too bad there's no pre-commit ?

Co-authored-by: Benjamin Lefaudeux <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants