-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Add better compatibility with diffusers-interpret (and possibly other use cases!)
#506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add better compatibility with diffusers-interpret (and possibly other use cases!)
#506
Conversation
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
keturn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi João! I was just introduced to diffusers-interpret yesterday via the discord! I have all the same questions so I love seeing that sort of thing.
I have no authority to merge anything here, but I've taken the liberty of leaving a few notes.
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
| if not return_dict: | ||
| return (image, has_nsfw_concept) | ||
| return (image, has_nsfw_concept, all_latents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if PipelineOutput classes are the way forward and the tuple return format here is mainly for backwards compatibility, we should leave it the same size it was (a pair) and not worry about adding new features to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this method had a @torch.no_grad decorator, I don't think this is for backwards compatibility 🤔
But looking at the transformers package, it seems that when return_dict_generate=False, options like output_scores/output_attentions don't matter, so it makes sense to remove latents from the tuple as you mention :)
|
|
||
| images: Union[List[PIL.Image.Image], np.ndarray] | ||
| nsfw_content_detected: List[bool] | ||
| latents: Optional[List[torch.Tensor]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me! What do you think @pcuenca @anton-l @patil-suraj ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with me as well.
| Args: | ||
| prompt (`str` or `List[str]`): | ||
| The prompt or prompts to guide the image generation. | ||
| prompt (`str`, `List[str]` or `torch.Tensor`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ufff, I don't think the input prompt should ever be a tensor that's confusing and opens the box for hacky code - can't we just work with the latents inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we use inputs_embeds as some methods in transformers have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Patrick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we can pass an image to StableDiffusionImg2ImgPipeline.call as a tensor but we can't pass a text...
Would and extra inputs_embeds or prompt_embeds argument work as an alternative?
| batch_size = 1 | ||
| elif isinstance(prompt, list): | ||
| batch_size = len(prompt) | ||
| elif torch.is_tensor(prompt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think prompt should ever be a tensor
|
|
||
| if output_latents: | ||
| # save latents from all diffusion steps | ||
| all_latents.append(latents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with this! Think this makes a lot of sense
| self.to(device) | ||
|
|
||
| # enable/disable grad | ||
| was_grad_enabled = torch.is_grad_enabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also be a bit difficult to accept for me. It's a) a bit hacky to me and b) Pipelines by definition should only be used for inference. I assume the gradients are needed for analysis and the idea is not to do training?
I'm not 100% sure whether they are enough use cases that warrant allowing gradient flow here on the other hand it also shouldn't hurt really if we leave the default to False. IMO working with function decorators and enable_grad + disable_grad functions is the way to go here though instead.
What do you think @patil-suraj @anton-l @pcuenca ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as Patrick, not really in favor of this.
IMO working with function decorators and enable_grad + disable_grad functions is the way to go here though instead.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO working with function decorators and enable_grad + disable_grad functions is the way to go here though instead.
By that means, keeping @torch.no_grad decorator in __call__ and add put the whole method under with torch.enable_grad() if enable_grad else nullcontext(): ?
Not sure what you meant.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @JoaoLages,
Super cool interpret library btw! Like the idea of returning the latents and could also be convinced to allow gradient computation (even though I think the use case is too niche for now and would only be pro if all the community really wants this feature cc @hysts @pcuenca @patil-suraj @anton-l - wdyt?). Don't think we should allow the prompt to be a torch.Tensor.
patil-suraj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR @JoaoLages !
I have pretty much same comments as Patrick.
- We could definitely return intermidiate latents
- gradient checkpointing will be supported soon. It should not be included in pipeline like this. Pipelines are intended for inference only so it's best to avoid training related logic here.
- enabling gradients: We could add this if community is really interested in it. Could you please open an issue for this ?
|
|
||
| images: Union[List[PIL.Image.Image], np.ndarray] | ||
| nsfw_content_detected: List[bool] | ||
| latents: Optional[List[torch.Tensor]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with me as well.
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
Outdated
Show resolved
Hide resolved
| Args: | ||
| prompt (`str` or `List[str]`): | ||
| The prompt or prompts to guide the image generation. | ||
| prompt (`str`, `List[str]` or `torch.Tensor`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Patrick.
| self.to(device) | ||
|
|
||
| # enable/disable grad | ||
| was_grad_enabled = torch.is_grad_enabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as Patrick, not really in favor of this.
IMO working with function decorators and enable_grad + disable_grad functions is the way to go here though instead.
+1
🚀
There you go #529 |
|
Closing this PR as it does too many changes at once -> happy to continue the discussion on the single PRs that were opened :-) |
Hi there!
I love this package ❤️
I'm the author of diffusers-interpret and along my work found these features very useful to add to this main package:
DiffusionPipeline.__call__while calculating gradients;output_latentsflag (similar tooutput_scores/output_attentions/etc fromtransformers) that adds alatentsattribute to the output;Deactivating safety checker;removed this option (12ca969)text_embeddingsdirectly instead of the text string;transformerstoo).That's about it 😄
To start this PR I made the changes only for the
StableDiffusionPipelineclass, but I can port those changes to the other pipelines if you agree with them.