Skip to content

Prompt truncation detection and return of tokenized prompts #447

@shirayu

Description

@shirayu

Is your feature request related to a problem? Please describe.

There are examples of long prompts on the web, but many people are unaware that long strings are truncated.
However, there is no way to know wheter StableDifusionPipelineOutputprompt truncated long prompt or not.

Long prompts are truncated here implicitly.

text_input = self.tokenizer(
prompt,
padding="max_length",
max_length=self.tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)

Describe the solution you'd like

Returns a bool value indicating whether it has been truncated or not.
It is more convenient to return a tokenized prompt as well.

if not return_dict:
return (image, has_nsfw_concept)
return StableDiffusionPipelineOutput(images=image, nsfw_content_detected=has_nsfw_concept)

Describe alternatives you've considered

Additional context
In my app, after generating images, it tokenizes prompts separately to check whether they has been truncated or not.
However, this is a waste to tokenize them again.

https://github.com/shirayu/purepale/blob/ab89a20e1a0e1728d4cc98a3d8381b131c7c6319/purepale/serve.py#L167-L172

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions