-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[WIP]Vae preprocessor refactor (PR1) #3557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start! Can we maybe first merge the PR that adds VAE preprocess and then merge this one? Otherwise people will see lots of deprecation warnings 😅
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
| else: | ||
| do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept] | ||
|
|
||
| image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten I refactored the 4x upscaler here (just preprocess and postprocess, not accepting latents)
however I think I changed the logic of postprocess here, i.e. if output_type ==pt, currently it will return a pytorch tensor that's unnormalized, which is inconsistent with image_processor.postprocess. Let me know if we actually intend to return pytorch tensor between [-1,1] for this pipeline though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If previously, the latent upscaler was returning unnormalized tensors, I would prefer keeping it that way to avoid any unforeseen consequences?
Maybe, we could add a flag to image_processor.postprocess to check if normalization is needed? To me, that is a cleaner and more idiomatic approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiyixuxu could we for now make sure that the output stays exactly the same. I.e. we should not change the behavior of the pipelines in any way IMO.
| """ | ||
| Convert a PIL image or a list of PIL images to numpy image | ||
| """ | ||
| if not isinstance(images, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to also check if image is of type PIL.Image.Image?
| width, height = ( | ||
| x - x % self.config.vae_scale_factor for x in (width, height) | ||
| ) # resize to integer multiple of vae_scale_factor | ||
| image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
| self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True) | ||
| self.control_image_processor = VaeImageProcessor( | ||
| vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see this coming to fruition.
| if ( | ||
| not image_is_pil | ||
| and not image_is_tensor | ||
| and not image_is_np | ||
| and not image_is_pil_list | ||
| and not image_is_tensor_list | ||
| and not image_is_np_list | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of conditions hahaha.
|
|
||
| image = self.control_image_processor.preprocess(image, height=height, width=width).to(dtype=torch.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, all of this logic is handled by preprocess() now? That's amazing!
| warnings.warn( | ||
| "The preprocess method is deprecated and will be removed in a future version. Please" | ||
| " use VaeImageProcessor.preprocess instead", | ||
| FutureWarning, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, our plan is to refactor this in a future PR, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this function should be fully deprecated and removed in the future
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Co-authored-by: Sayak Paul <[email protected]>
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Outdated
Show resolved
Hide resolved
| def test_inference_batch_single_identical(self): | ||
| super().test_inference_batch_single_identical(expected_max_diff=3e-3) | ||
|
|
||
| # Overwrite the default test_latents_inputs because pix2pix encode the image differently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can merge this more or less. The final missing piece seems to be this: https://github.com/huggingface/diffusers/pull/3557/files#r1214586673
Can we make sure that we don't change the output behavior in any way?
This reverts commit 0ca3473.
|
@patrickvonplaten |
|
Having changed this: #3557 (comment) I think we can merge this PR 🥳 |
…sion_latent_upscale.py Co-authored-by: Patrick von Platen <[email protected]>
VaeImageProcessor.preprocess refactor * refactored VaeImageProcessor - allow passing optional height and width argument to resize() - add convert_to_rgb * refactored prepare_latents method for img2img pipelines so that if we pass latents directly as image input, it will not encode it again * added a test in test_pipelines_common.py to test latents as image inputs * refactored img2img pipelines that accept latents as image: - controlnet img2img, stable diffusion img2img , instruct_pix2pix --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Sayak Paul <[email protected]>
VaeImageProcessor.preprocess refactor * refactored VaeImageProcessor - allow passing optional height and width argument to resize() - add convert_to_rgb * refactored prepare_latents method for img2img pipelines so that if we pass latents directly as image input, it will not encode it again * added a test in test_pipelines_common.py to test latents as image inputs * refactored img2img pipelines that accept latents as image: - controlnet img2img, stable diffusion img2img , instruct_pix2pix --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Sayak Paul <[email protected]>
VaeImageProcessor.preprocess refactor
VaeImageProcessora little bitheightandwidthargument toresize()convert_to_rgbprepare_latentsmethod for img2img pipelines so that if we pass latents directly as image input, it will not encode it againtest_pipelines_common.pyto test latents as image inputsimage: controlnet img2img, stable diffusion img2img , instruct_pix2pix