-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Progressive distillation #1010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progressive distillation #1010
Conversation
* v diffusion support for ddpm * quality and style * variable name consistency * missing base case * pass prediction type along in the pipeline * put prediction type in scheduler config * style
|
@lukovnikov sorry for the delayed response. I think that's an important meta decision to make with the Diffusers team! I've seen a similar approach e.g., in the Imagic PR. For now I'm still working on getting the distillation results as good as the paper's! |
|
@bglick13 I've also implemented distillation in my fork: https://github.com/lukovnikov/diffusers/blob/mine/examples/unconditional_image_generation/distill_unconditional.py |
|
lukovnikov - at a guess the clip text encoder tokens are also being trained and are becoming 'longer' (attention focuses more on longer vectors which are concepts that the text encoder knows well, which is why popular celebrities can get an overbaked appearance in default models) which causes the images to be overbaked. |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This is for untuned models.
|
@patrickvonplaten do you think this is something to give it a try again? |
Starting to work on implementing PROGRESSIVE DISTILLATION FOR FAST SAMPLING OF DIFFUSION MODELS
This is very much a draft PR. So far I've included a toy example in a notebook.
First, it trains an unconditional image diffusion model on a single image. Then, it implements the distillation training procedure - training a student model to produce the same output in N // 2 steps.
TODOs include: