-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[refactor] Making the xformers mem-efficient attention activation recursive #1493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor] Making the xformers mem-efficient attention activation recursive #1493
Conversation
| self.attn2._slice_size = slice_size | ||
|
|
||
| def _set_use_memory_efficient_attention_xformers(self, use_memory_efficient_attention_xformers: bool): | ||
| def set_use_memory_efficient_attention_xformers(self, use_memory_efficient_attention_xformers: bool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
called from the outside so can be public ? Plus conveys the idea that it's a capability being exposed
|
The documentation is not available anymore as the PR was closed or merged. |
| feature_extractor=feature_extractor, | ||
| ) | ||
|
|
||
| def enable_xformers_memory_efficient_attention(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below: inherits from DiffusionPipeline so I figured that this could be defined there (with the recursive take) to remove a lot of code duplication
| if hasattr(block, "attentions") and block.attentions is not None: | ||
| block.set_attention_slice(slice_size) | ||
|
|
||
| def set_use_memory_efficient_attention_xformers(self, use_memory_efficient_attention_xformers: bool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these are just trampolines, not needed with the recursive call from the top. An issue with these trampolines is that they're bound to miss some cases (they do) since they would have to be changed any time a new capability is exposed somewhere in the pipeline
| """ | ||
| self.set_use_memory_efficient_attention_xformers(False) | ||
|
|
||
| def set_use_memory_efficient_attention_xformers(self, valid: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the actual single implementation on how to enable mem-efficient attention across the whole model, for all pipelines (covers superres, outpainting or text2img, which mobilize attention in different places at times)
| def set_progress_bar_config(self, **kwargs): | ||
| self._progress_bar_config = kwargs | ||
|
|
||
| def enable_xformers_memory_efficient_attention(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this enable and disable shorthands are just there because many derived pipelines were using that, so I figured that it was cheaper to expose the call here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me to make a method of DiffusionPipeline !
|
open for feedback, this is a suggestion of course @patrickvonplaten @kashif |
|
I tested which works fine with this PR |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anton-l @pcuenca @patil-suraj @williamberman what do you think?
|
PR looks very nice to me! Given that xformers can essentially be used with every attention layer and every unet pretty much has an attention layer and every pipeline has at least one unet, I think it's a good idea to make it a "global" method by adding it to |
|
if you check a PR like this one, the changes here make it a lot easier and would remove 2/3rd of the lines of code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool PR, makes it so much cleaner now ! And agree with you, keeping it in DiffusionPipeline makes sense!
Thanks a lot for working on this!
…ursive (huggingface#1493) * Moving the mem efficiient attention activation to the top + recursive * black, too bad there's no pre-commit ? Co-authored-by: Benjamin Lefaudeux <[email protected]>
…ursive (huggingface#1493) * Moving the mem efficiient attention activation to the top + recursive * black, too bad there's no pre-commit ? Co-authored-by: Benjamin Lefaudeux <[email protected]>
…ursive (huggingface#1493) * Moving the mem efficiient attention activation to the top + recursive * black, too bad there's no pre-commit ? Co-authored-by: Benjamin Lefaudeux <[email protected]>
set_use_memory_efficient_attention_xformersin a leaf module is all it takes for it to be picked up (important for some pipelines, like superres, which are not properly covered right now - see for instance simplyfy AttentionBlock #1492 )cc @patrickvonplaten, discussed a couple of days ago. Note that there does not seem to be unit tests covering this part, unless I missed them