-
Notifications
You must be signed in to change notification settings - Fork 6.5k
allow multiple generations per prompt #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
pcuenca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
We should also do it in the flax version for faster inference, but we can do so later.
NouamaneTazi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing PR, thank you for taking care of this 🙏🏼
| uncond_embeddings = self.text_encoder(uncond_input.input_ids.to(self.device))[0] | ||
|
|
||
| # duplicate unconditional embeddings for each generation per prompt | ||
| uncond_embeddings = uncond_embeddings.repeat_interleave(batch_size * num_images_per_prompt, dim=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last time I checked I had issues with repeat_interleave and ONNX. We probably just need to require a specific opset version though.
Can you please double-check @anton-l?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will check that. But this PR doesn't modify onnx pipeline, so it should be fine.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
In the current implementation, e.g. if you want 9 images, the current implementation multiplies the batch size by 9. Good luck with fitting that on a consumer GPU 😁 When I saw this arg I was expecting it to have sequential pipeline runs (with just the same prompt). A lot of other repos have it like this. Extra kudos: when num_images_per_prompt > 1, also allow a list of seeds intake. |
|
We need indeed some good warning / error throwing system here |
Hmm I'm not sure this has anything to do with warning or errors. Just pointing out that -from a user perspective- batch_size and num_images_per_prompt are the same. |
|
Yeah but |
|
Thanks for the comment @rvorias ! It would be bit complicated to run pipe in sequence inside the pipe, the users can do it very easily, by just calling the pipeline in a loop, this way you as a users have full control over it. We'll document this better in examples, docs :) |
* compute text embeds per prompt * don't repeat uncond prompts * repeat separatly * update image2image * fix repeat uncond embeds * adapt inpaint pipeline * ifx uncond tokens in img2img * add tests and fix ucond embeds in im2img and inpaint pipe
* compute text embeds per prompt * don't repeat uncond prompts * repeat separatly * update image2image * fix repeat uncond embeds * adapt inpaint pipeline * ifx uncond tokens in img2img * add tests and fix ucond embeds in im2img and inpaint pipe
To generate multiple images for a prompt currently we need to repeat the prompts before calling the pipeline.
Because of this the text embeddings and uncond embeddings are computed multiple times for the same prompt.
This PR adds
num_images_per_promptargument to stable diffusion pipelines to allow returning multiple images per prompt without repeating them. With this the text embeddings and uncond embeddings for each prompt are computed once and repeated according to the value ofnum_images_per_prompt.Thanks @NouamaneTazi !