Skip to content

Conversation

@shirayu
Copy link
Contributor

@shirayu shirayu commented Sep 12, 2022

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 12, 2022

The documentation is not available anymore as the PR was closed or merged.

@shirayu shirayu changed the title [WIP] Return encoded texts by DiffusionPipelines (Fix #447) [WIP] Return encoded texts by DiffusionPipelines (Resolve #447) Sep 12, 2022
@shirayu shirayu marked this pull request as ready for review September 12, 2022 11:22
@shirayu shirayu changed the title [WIP] Return encoded texts by DiffusionPipelines (Resolve #447) Return encoded texts by DiffusionPipelines (Resolve #447) Sep 12, 2022
@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Sep 16, 2022

Hey @shirayu,

Thanks for opening the PR! I think we can make our lives a bit easier here by simply catching whether the input was truncated or not before hand - then we don't have to return the text embeddings and check after the fact :-). How about we replace these lines here:

with:

text_inputs = self.tokenizer(
    prompt,
    padding="max_length",
    max_length=self.tokenizer.model_max_length,
    return_tensors="pt",
)
text_input_ids = text_inputs.input_ids

if text_input_ids.shape[-1] > self.tokenizer.model_max_length:
    removed_text = self.tokenizer.batch_decode(text_input_ids[self.tokenizer_model_max_length:])
    logger.warn(f"The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: {removed_text}")

text_input_ids = text_input_ids[:, :self.tokenizer.model_max_length].to(self.device)

Then the user gets a nice warning and we are error-robust :-)

@shirayu
Copy link
Contributor Author

shirayu commented Sep 17, 2022

Thank you for the comment @patrickvonplaten !

I think using logger.warn is useful for users of CLI, but it is not when used from other applications.
For example, if a web app accepts a token and outputs the result with these pipelines, the method cannot display a warning message to the web app user.

@patrickvonplaten
Copy link
Contributor

Hey @shirayu,

I think you can catch warnings with PyTorch: https://stackoverflow.com/questions/5644836/in-python-how-does-one-catch-warnings-as-if-they-were-exceptions

I'm not sure we want to return text embeddings just for a potential warning that could be displayed later - the use case is too small to warrant adding a new output tuple which might break pipelines that expect only two outputs.

Would it be ok for you to try out catching the warning or adding a specific logger? https://stackoverflow.com/questions/14058453/making-python-loggers-output-all-messages-to-stdout-in-addition-to-log-file

@patrickvonplaten
Copy link
Contributor

Also @patil-suraj @pcuenca @anton-l what do you think here?


import PIL
from PIL import Image
from transformers import BatchEncoding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this under line 34 (below is_transformers_available()) ? Otherwise this breaks the init as transformers is not a hard dependency

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anton-l @patil-suraj would love to hear your opinion here.

I would have preferred to just throw a warning given the goal of this PR, but I also see why it could make sense to return text_embeddings - what do you think?

@shirayu - either way we can only return the text embeddings optionally (add a return_embeddings=True/False flag) to the __call__ to not break the outputs.

Overall, I'm leaning towards not adding this functionality though as it will add another argument to the __call__ API and another output to stable diffusion for IMO quite an edge case.

@patrickvonplaten
Copy link
Contributor

Keen to hear you thoughs here @shirayu

@patil-suraj
Copy link
Contributor

patil-suraj commented Sep 22, 2022

Thanks for the PR @shirayu !

Returning text_embeddings only to check if prompt is truncated or not doesn't seem ideal to me. I'm also in favor of throwing a warning instead. You can run the tokenizer before or after the pipeline to check if prompt is too long. Also running tokenizer is not an expensive operation, so I think it won't hurt anything.

@shirayu shirayu changed the title Return encoded texts by DiffusionPipelines (Resolve #447) Warning for too long prompts in DiffusionPipelines (Resolve #447) Sep 23, 2022
@shirayu
Copy link
Contributor Author

shirayu commented Sep 23, 2022

Thank you for your comments.
After reading those, I think that it doesn't necessarily need to return tokens from the pipeline.
Following @patrickvonplaten's comment, I've changed to log warnings.

Note, this will drop the last special token <|endoftext|> from the truncated tokens.

@patrickvonplaten patrickvonplaten merged commit f7ebe56 into huggingface:main Sep 27, 2022
@shirayu shirayu deleted the feature/stable_diffusion/return_encoded_text branch October 23, 2022 12:50
prathikr pushed a commit to prathikr/diffusers that referenced this pull request Oct 26, 2022
…ce#447) (huggingface#472)

* Return encoded texts by DiffusionPipelines

* Updated README to show hot to use enoded_text_input

* Reverted examples in README.md

* Reverted all

* Warning for long prompts

* Fix bugs

* Formatted
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…ce#447) (huggingface#472)

* Return encoded texts by DiffusionPipelines

* Updated README to show hot to use enoded_text_input

* Reverted examples in README.md

* Reverted all

* Warning for long prompts

* Fix bugs

* Formatted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prompt truncation detection and return of tokenized prompts

5 participants