Skip to content

Conversation

@realimposter
Copy link

What does this PR do?

Adds controlnet reference functionality to the controlnet Img2Img pipeline. I created this by combining "stable_diffusion_controlnet_reference.py" into "pipeline_controlnet_img2img.py"

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@patrickvonplaten
Copy link
Contributor

Hey @realimposter,

What is a common use case of this? Also how is this different from #3435?

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jul 26, 2023
@patrickvonplaten
Copy link
Contributor

Also related to: #4257

@yamkz
Copy link

yamkz commented Aug 13, 2023

I really desire the SDXL-supported 'img2img+ControlNet+reference only pipeline' and 'inpainting+ControlNet+reference only pipeline.' I would be very happy if they were available.

#4589

@stilletto
Copy link

stilletto commented Sep 9, 2023

Hey @realimposter,

What is a common use case of this? Also how is this different from #3435?

It can be usefull with face photo editing on low level of noice power.

@breengles
Copy link

breengles commented Oct 16, 2023

Hey @realimposter,

What is a common use case of this? Also how is this different from #3435?

Hey @patrickvonplaten!
I am also trying to get it working (well, i2i adapts more or less straightforwardly). The most different feature is that the pipeline you mentioned does not support image input (but as I said it can be readily adapted).

Currently, I am struggling to adapt reference guidance to the inpaint pipelines for the same reason -- it would allow more ways to edit images. By the way, these combinations of inpainting, reference, and ControlNets work totally fine in A1111 so it might be useful to them in diffusers (perhaps, in the community examples). So, I tried to do the following but haven't got any reasonable result:

        # 10. Denoising loop
        num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
        with self.progress_bar(total=num_inference_steps) as progress_bar:
            for i, t in enumerate(timesteps):
                # expand the latents if we are doing classifier free guidance
                latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
                latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)

                # ref only part
                noise = randn_tensor(
                    ref_image_latents.shape,
                    generator=generator,
                    device=device,
                    dtype=ref_image_latents.dtype,
                )
                ref_xt = self.scheduler.add_noise(
                    ref_image_latents,
                    noise,
                    t.reshape(
                        1,
                    ),
                )
                ref_xt = torch.cat([ref_xt] * 2) if do_classifier_free_guidance else ref_xt
                ref_xt = self.scheduler.scale_model_input(ref_xt, t)

                if num_channels_unet == 9:
                    # concat latents, mask, masked_image_latents in the channel dimension
                    latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1)
                    
                    if do_classifier_free_guidance:
                        ref_image_latents_inject = torch.cat([ref_image_latents] * 2)
                    else:
                        ref_image_latents_inject = ref_image_latents

                    ref_xt = torch.cat([ref_xt, empty_mask, ref_image_latents_inject], dim=1)

                MODE = "write"
                self.unet(
                    ref_xt,
                    t,
                    encoder_hidden_states=prompt_embeds,
                    cross_attention_kwargs=cross_attention_kwargs,
                    return_dict=False,
                )

                # predict the noise residual
                MODE = "read"
                noise_pred = self.unet(
                    latent_model_input,
                    t,
                    encoder_hidden_states=prompt_embeds,
                    cross_attention_kwargs=cross_attention_kwargs,
                    return_dict=False,
                )[0]
...

closeup-pexels-abed-albaset-alhasan-7814598
closeup-pexels-abed-albaset-alhasan-7814598
closeup-pexels-abed-albaset-alhasan-7814598-prompt=00_00000

I must say that everything works fine with non-inpainting weights with 4-channel input so I think it is smth wrong with how I am trying to prepare ref_xt (though, it should follow the A1111's flow).
I would greatly appreciate any advice 🤗

upd: seems like the artefacts happen if there is adain guidance (with and without attentions part)
upd x2: I used empty_mask as the zeros tensor (like in A1111) but it seems like the results are much better if tensor of ones is used instead

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DN6 can you take a look here?

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@patrickvonplaten
Copy link
Contributor

@DN6 gentle ping here

@DN6
Copy link
Collaborator

DN6 commented Nov 22, 2023

@realimposter Could we move this into the community pipelines please?

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants