[SD Img2Img] resize source images to multiple of 8 instead of 32 #1571

vvsotnikov · 2022-12-06T14:47:17Z

Since #505 is merged, the resolution requirements for img2img are relaxed and could be a multiple of 8.
Sample code:

import requests
import torch
from PIL import Image
from io import BytesIO

from diffusers import StableDiffusionImg2ImgPipeline

device = "cuda"
model_id_or_path = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    model_id_or_path,
    revision="fp16",
    torch_dtype=torch.float16,
)
pipe = pipe.to(device)

# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 504))  # notice that 504 is not divisible by 32

prompt = "A fantasy landscape, trending on artstation"
generator = torch.Generator(device).manual_seed(42)
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
print(image.width, image.height)
image.show()

The result before the fix is resized down to 768*480:

The result after the fix preserves the original 768*504 resolution:

This change doesn't break the tests but could hurt some reproducibility as the latents' shape is different now.

…f 8 instead of 32

HuggingFaceDocBuilderDev · 2022-12-06T14:52:02Z

The documentation is not available anymore as the PR was closed or merged.

… of 32

vvsotnikov · 2022-12-06T14:59:55Z

check_repository_consistency failed so I added this fix to AltDiffusion as well.

patrickvonplaten

I see this makes a lot of sense! Could you maybe add one test here? :-)

vvsotnikov · 2022-12-07T10:34:21Z

Sure! Should I make an fp16 version of the test as well, or fp32 only would be enough?

vvsotnikov · 2022-12-07T11:52:21Z

ONNX img2img pipeline is actually failing when I try to use image that is divisible by 8 but not 16 or 32:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/c/Users/vladimir/AppData/Local/JetBrains/Toolbox/apps/PyCharm-P/ch-0/22 │
│ 3.7571.203/plugins/python/helpers/pydev/pydevconsole.py:364 in runcode       │
│                                                                              │
│   361 │   │   def runcode(self, code):                                       │
│   362 │   │   │   try:                                                       │
│   363 │   │   │   │   func = types.FunctionType(code, self.locals)           │
│ ❱ 364 │   │   │   │   coro = func()                                          │
│   365 │   │   │   │   if inspect.iscoroutine(coro):                          │
│   366 │   │   │   │   │   loop = asyncio.get_event_loop()                    │
│   367 │   │   │   │   │   loop.run_until_complete(coro)                      │
│ <input>:39 in <module>                                                       │
│                                                                              │
│ /mnt/c/Users/vladimir/PycharmProjects/diffusers/src/diffusers/pipelines/stab │
│ le_diffusion/pipeline_onnx_stable_diffusion_img2img.py:408 in __call__       │
│                                                                              │
│   405 │   │   │                                                              │
│   406 │   │   │   # predict the noise residual                               │
│   407 │   │   │   timestep = np.array([t], dtype=timestep_dtype)             │
│ ❱ 408 │   │   │   noise_pred = self.unet(                                    │
│   409 │   │   │   │   sample=latent_model_input, timestep=timestep, encoder_ │
│   410 │   │   │   )[0]                                                       │
│   411                                                                        │
│                                                                              │
│ /mnt/c/Users/vladimir/PycharmProjects/diffusers/src/diffusers/onnx_utils.py: │
│ 61 in __call__                                                               │
│                                                                              │
│    58 │                                                                      │
│    59 │   def __call__(self, **kwargs):                                      │
│    60 │   │   inputs = {k: np.array(v) for k, v in kwargs.items()}           │
│ ❱  61 │   │   return self.model.run(None, inputs)                            │
│    62 │                                                                      │
│    63 │   @staticmethod                                                      │
│    64 │   def load_model(path: Union[str, Path], provider=None, sess_options │
│                                                                              │
│ /home/vladimir/.virtualenvs/diffusers/lib/python3.10/site-packages/onnxrunti │
│ me/capi/onnxruntime_inference_collection.py:200 in run                       │
│                                                                              │
│   197 │   │   if not output_names:                                           │
│   198 │   │   │   output_names = [output.name for output in self._outputs_me │
│   199 │   │   try:                                                           │
│ ❱ 200 │   │   │   return self._sess.run(output_names, input_feed, run_option │
│   201 │   │   except C.EPFail as err:                                        │
│   202 │   │   │   if self._enable_fallback:                                  │
│   203 │   │   │   │   print("EP Error: {} using {}".format(str(err), self._p │
╰──────────────────────────────────────────────────────────────────────────────╯
Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while 
running Concat node. Name:'Concat_3588' Status Message: concat.cc:156 
PrepareForCompute Non concat axis dimensions must match: Axis 2 has mismatched 
dimensions of 5 and 6

I'm not really familiar with ONNX but I'll try to investigate.

vvsotnikov · 2022-12-07T14:40:29Z

Actually, it looks like ONNX pipeline can't even work with resolutions that are multiples of 32, only 64 are supported. This code uses the init image that is a multiple of 32 but still throws an error that is similar to the one that I've shared in the previous message:

import numpy as np
import onnxruntime as ort

from diffusers import OnnxStableDiffusionImg2ImgPipeline
from diffusers.utils import load_image

gpu_provider = (
    "CUDAExecutionProvider",
    {
        "gpu_mem_limit": "15000000000",  # 15GB
        "arena_extend_strategy": "kSameAsRequested",
    },
)
gpu_options = ort.SessionOptions()
gpu_options.enable_mem_pattern = False

init_image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
    "/img2img/sketch-mountains-input.jpg"
)
init_image = init_image.resize((512 - 32, 512))  # multiple of 32 but not 64
pipe = OnnxStableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="onnx",
    provider=gpu_provider,
    sess_options=gpu_options,
)
pipe.set_progress_bar_config(disable=None)

prompt = "A fantasy landscape, trending on artstation"

generator = np.random.RandomState(0)
output = pipe(
    prompt=prompt,
    image=init_image,
    strength=0.75,
    guidance_scale=7.5,
    num_inference_steps=10,
    generator=generator,
    output_type="np",
)

This applies to text2image too. Could it be related to the way that the model is being exported to ONNX format? torch.onnx.export() docs are saying that it doesn't preserve dynamic control flow when being exported from torch.nn.Module (which is the case for scripts/convert_stable_diffusion_checkpoint_to_onnx.py.

vvsotnikov · 2022-12-07T14:46:46Z

Could we for now apply this fix only to StableDiffusionImg2ImgPipeline and AltDiffusionImg2ImgPipeline, and keep the ONNX pipeline intact? :) Given that there's already a discrepancy in text2img between these three (the first two could generate a 504x504 image but the ONNX pipeline couldn't), I don't think having a similar discrepancy in img2img would be a problem. Moreover, it's probably to scale init images to a multiple of 64 when feeding it to the ONNX pipeline instead of 32 as the current implementation could throw errors like the one that I've posted above.

patrickvonplaten · 2022-12-11T16:04:47Z

Totally fine to not add the changes to ONNX! Just could we please add one test that shows how to do img2img wit ha multiple of 8?

vvsotnikov · 2022-12-14T18:50:22Z

Sure, will do later this week :)

github-actions · 2023-01-08T15:03:36Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten · 2023-01-12T19:11:58Z

Gently ping @vvsotnikov , happy to assign the PR to myself if you're busy :-)

vvsotnikov · 2023-01-12T20:41:23Z

@patrickvonplaten sorry for the delay, and thanks for reminding :) I'd be glad to finish this PR today or tomorrow, although it seems like I don't have permissions to reassign this back to myself 🤔

# Conflicts: # src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion_img2img.py # src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_img2img.py # src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py

vvsotnikov · 2023-01-12T22:11:11Z

@patrickvonplaten I've added the tests, however, check_repository_consistency is failing because, unlike the rest, the ONNX Img2Img pipeline can't work with multiplies of 8, only 64. What do you think I should do about that? :)

Also unsure why paint-by-example tests are failing - I haven't changed anything related to this pipeline, and these tests are green when I run them locally.

patrickvonplaten · 2023-01-13T13:49:30Z

cc @anton-l for ONNX.

Hmm quite surprised that the PaintByExample tests are failing here as those pipelines aren't touched.

Fixed ONNX for now by adding " with 8->64", think that's fine

patrickvonplaten · 2023-01-13T15:01:58Z

Ok tests are now all passing, not sure what was going on there. Also couldn't reproduce test failures locally -> merging!

Thanks a lot for the PR @vvsotnikov ❤️

This should be very useful for the community!

vvsotnikov · 2023-01-13T15:02:45Z

Glad to help!

…gingface#1571) * [Stable Diffusion Img2Img] resize source images to integer multiple of 8 instead of 32 * [Alt Diffusion Img2Img] resize source images to multiple of 8 instead of 32 * [Img2Img] fix AltDiffusion Img2Img resolution test * [Img2Img] add Stable Diffusion Img2Img resolution test * [Cycle Diffusion] round resolution to multiplies of 8 instead of 32 * [ONNX SD Img2Img] round resolution to multiplies of 64 instead of 32 * [SD Depth2Img] round resolution to multiplies of 8 instead of 32 * [Repaint] round resolution to multiplies of 8 instead of 32 * fix make style Co-authored-by: Patrick von Platen <[email protected]>

[Stable Diffusion Img2Img] resize source images to integer multiple o…

caec795

…f 8 instead of 32

[Alt Diffusion Img2Img] resize source images to multiple of 8 instead…

875ee0c

… of 32

patrickvonplaten reviewed Dec 7, 2022

View reviewed changes

Merge branch 'main' into img2img-resolutiuon-fix

1d0a640

github-actions bot added the stale Issues that haven't received updates label Jan 8, 2023

patrickvonplaten self-assigned this Jan 12, 2023

[Img2Img] fix AltDiffusion Img2Img resolution test

0dc8d0a

vvsotnikov added 6 commits January 12, 2023 22:51

[Img2Img] add Stable Diffusion Img2Img resolution test

0cc4792

[Cycle Diffusion] round resolution to multiplies of 8 instead of 32

a90b7e7

[ONNX SD Img2Img] round resolution to multiplies of 64 instead of 32

50cbde7

[SD Depth2Img] round resolution to multiplies of 8 instead of 32

5c5fe96

[Repaint] round resolution to multiplies of 8 instead of 32

7a44f66

fix make style

c42de0f

patrickvonplaten merged commit 9b37ed3 into huggingface:main Jan 13, 2023

vvsotnikov deleted the img2img-resolutiuon-fix branch January 13, 2023 15:03

lachlan-nicholson mentioned this pull request Nov 23, 2023

Generations with resolutions not divisible by 32 incur loss of quality at bottom & right edges. #5903

Closed

[SD Img2Img] resize source images to multiple of 8 instead of 32 #1571

[SD Img2Img] resize source images to multiple of 8 instead of 32 #1571

Uh oh!

Conversation

vvsotnikov commented Dec 6, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvsotnikov commented Dec 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

vvsotnikov commented Dec 7, 2022

Uh oh!

vvsotnikov commented Dec 7, 2022

Uh oh!

vvsotnikov commented Dec 7, 2022

Uh oh!

vvsotnikov commented Dec 7, 2022

Uh oh!

patrickvonplaten commented Dec 11, 2022

Uh oh!

vvsotnikov commented Dec 14, 2022

Uh oh!

github-actions bot commented Jan 8, 2023

Uh oh!

patrickvonplaten commented Jan 12, 2023

Uh oh!

vvsotnikov commented Jan 12, 2023

Uh oh!

vvsotnikov commented Jan 12, 2023

Uh oh!

patrickvonplaten commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvsotnikov commented Jan 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HuggingFaceDocBuilderDev commented Dec 6, 2022 •

edited

Loading

vvsotnikov commented Dec 6, 2022 •

edited

Loading

patrickvonplaten commented Jan 13, 2023 •

edited

Loading

patrickvonplaten commented Jan 13, 2023 •

edited

Loading