-
Notifications
You must be signed in to change notification settings - Fork 6.5k
add StableDiffusionXLControlNetImg2ImgPipeline #4592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
😍 |
|
please try: seed = 123456
generator_1 = torch.manual_seed(seed)
generator_2 = torch.manual_seed(seed ^ 0xFFFFFFF)or similar, the other thing is that Img2Img guidance scale needs to be lower. it defaults to 5.0. maybe this is too low for use with ControlNet. when you say it doesn't work well, you mean in any context using base SDXL 1.0 as img2img model? |
yes, i refer to the context where we use the device = "cuda"
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))
prompt = "A fantasy landscape, trending on artstation"
model_id_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
model_id_or_path,
vae=vae,
torch_dtype=torch.float16)
pipe = pipe.to(device)
generator = torch.manual_seed(0)
images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images
images[0].save("fantasy_landscape_sdxl.png") |
|
"doesn't work" means it kicks out an error, or simply results in unexpected image? |
|
i think in your example you just provided the only issue i can see is the input image resolution is too low. |
so pass a |
|
i don't believe that will help, it just seems that SDXL's ability to generate low resolution images is compromised due to its fine-tuning on 1024px. it is as if the transformer layers are heavily conditioned to represent details that simply won't exist in lower resolution. |
ok, rechecked the paper, they did mention this. will try it on a 1024 x 1024 image |
|
I also tried to resize the other image to 1024x1024 - didn't make any difference. so not sure if resolutions play a part here |
|
related :#4724 |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Can we add some tests here as well? Then it should be good to go :-)
| >>> image = pipe( | ||
| ... prompt, controlnet_conditioning_scale=controlnet_conditioning_scale, image=canny_image | ||
| ... ).images[0] | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example needs to be edited.
| unet: UNet2DConditionModel, | ||
| controlnet: ControlNetModel, | ||
| scheduler: KarrasDiffusionSchedulers, | ||
| requires_aesthetics_score: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW let's make sure to add a docstring for requires_aestetcs_score here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is yet to be addressed.
| if isinstance(controlnet, (list, tuple)): | ||
| raise ValueError("MultiControlNet is not yet supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support Multi ControlNet for SDXL. https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl#multicontrolnet
|
|
||
| # Copied from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl_img2img.StableDiffusionXLImg2ImgPipeline._get_add_time_ids | ||
| def _get_add_time_ids( | ||
| self, original_size, crops_coords_top_left, target_size, aesthetic_score, negative_aesthetic_score, dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When did we add support aesthetic scoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is copied from the XL Img2Img pipeline, which is designed for use with the refiner.
i suppose you could tune a ControlNet with aesthetic values, and it would:
- lead to lower prompt adherence, in favour of aesthetic scored data distribution
- require conditional dropout so that we don't overfit to relying on these scores
- need additional dataset management
that said, i would actually like to see support for refiner tuning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think aesthetic scores won't actually take effect here since the config.requires_aesthetics_score is set to False. I'm slightly in favor of keeping these arguments because:
- one can potentially train a controlnet with refiner in the future (maybe?)
- It is consistent with sdxl img2img pipeline, where aesthetic scores are also just place holders when you use it with base model.
however, I don't have a strong opinion on this, will be happy to remove it if you guys think it's better that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with your judgement here @yiyixuxu - I also think it could make sense to train a controlnet refiner (by limiting the added noise to < 20%)
| aesthetic_score: float = 6.0, | ||
| negative_aesthetic_score: float = 2.5, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested their effect?
… into controlnet-img2img
| self.controlnet.to("cpu") | ||
| torch.cuda.empty_cache() | ||
|
|
||
| # make sure the VAE is in float32 mode, as it overflows in float16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update this part of the code to conform with the changes here: #4796 ?
|
Cool I think we're mostly good to go here no? Ok to merge for me once all tests are green. @sayakpaul if you could give a final review that'd be great! |
| latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) | ||
|
|
||
| # controlnet(s) inference | ||
| if guess_mode and do_classifier_free_guidance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to also correct the behaviour for guess_mode here per your other PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i forgot! thanks! will add
| self._test_inference_batch_single_identical(expected_max_diff=2e-3) | ||
|
|
||
| # TODO(Patrick, Sayak) - skip for now as this requires more refiner tests | ||
| def test_save_load_optional_components(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the problem here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure actually! I thought maybe you left the note 😂
it was copied from sdxl-img2img
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah that note was me! The problem is that for img2img/refiner the "text_encoder" is optional so we should be able to load the pipeline without it, but that's not possible when using the non-refiner architecture (which is currently done in tests). We should probably add a new test class here at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is the plan for the refiner support of controlnet Img2Img?
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very nice! Thanks for this one!
|
Let's also fix the tests before merging :-) |
|
Ignore if I'm offbase for some reason, but I'm not sure from_single_file support is there. |
|
Hello! I am actually trying to use this for my prototype. Thank you so much for creating this pipeline :) I have 2 questions about this pipeline:
In this pipeline, where is the equilvalent logic? I was reading the code and it seems like vae.image_processor only converts image into tensor but I don't see the masking logic in place.
I am still learning how SD and controlnet models work so thank you so much for your help in advance. @yiyixuxu |
--------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]>
--------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]>






working super well now!
as a comparison, this is the output we would get from regular controlnet 😂😂😂