Skip to content

Generations with resolutions not divisible by 32 incur loss of quality at bottom & right edges. #5903

@lachlan-nicholson

Description

@lachlan-nicholson

Description

When generating images at resolutions not divisible by 32, image quality at the bottom and right edge is negatively impacted. It looks like this issue has persisted since #505 as shown in the second example image in #1571.

I haven't had the chance to do a deep dive on this issue so it's not clear to me if this is an expected limitation or not.
I have attached example code to replicate this issue as well as some example images showing the quality loss.

1024x1024 generation with good quality:
fox_1024_2

1032x1032 generation with poor quality bottom and right edges:
fox_1032_1

Reproduction

import torch
from diffusers import StableDiffusionXLPipeline

# load pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16",
    torch_dtype=torch.float16,
).to("cuda", torch.float16)
pipe.watermark = None

# run pipeline
for size in [1024, 1032]:
    for seed in range(4):
        img = pipe(
            prompt="Colorful fox illustration",
            width=size,
            height=size,
            guidance_scale=7.0,
            num_inference_steps=15,
            generator=torch.Generator(device="cpu").manual_seed(seed),
        ).images[0]
        img.save(f"./fox_{size}_{seed}.png")

System Info

diffusers==0.21.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions