-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Update training and fine-tuning docs #1020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
8a11ff1
Update training and fine-tuning docs.
pcuenca 5721bf0
Update examples README.
pcuenca 38d11e6
Update README.
pcuenca 422f25e
Add Flax fine-tuning section.
pcuenca c04d5eb
Accept suggestion
pcuenca 119e5b8
Accept suggestion
pcuenca File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,240 @@ | ||
| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # DreamBooth fine-tuning example | ||
|
|
||
| [DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like stable diffusion given just a few (3~5) images of a subject. | ||
|
|
||
|  | ||
| _Dreambooth examples from the [project's blog](https://dreambooth.github.io)._ | ||
|
|
||
| The [Dreambooth training script](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) shows how to implement this training procedure on a pre-trained Stable Diffusion model. | ||
|
|
||
| <Tip warning={true}> | ||
|
|
||
| <!-- TODO: replace with our blog when it's done --> | ||
|
|
||
| Dreambooth fine-tuning is very sensitive to hyperparameters and easy to overfit. We recommend you take a look at our [in-depth analysis](https://wandb.ai/psuraj/dreambooth/reports/Dreambooth-Training-Analysis--VmlldzoyNzk0NDc3) with recommended settings for different subjects, and go from there. | ||
|
|
||
| </Tip> | ||
|
|
||
| ## Training locally | ||
|
|
||
| ### Installing the dependencies | ||
|
|
||
| Before running the scripts, make sure to install the library's training dependencies. We also recommend to install `diffusers` from the `main` github branch. | ||
|
|
||
| ```bash | ||
| pip install git+https://github.com/huggingface/diffusers | ||
| pip install -U -r diffusers/examples/dreambooth/requirements.txt | ||
| ``` | ||
|
|
||
| Then initialize and configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with: | ||
|
|
||
| ```bash | ||
| accelerate config | ||
| ``` | ||
|
|
||
| You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. | ||
|
|
||
| You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens). | ||
|
|
||
| Run the following command to authenticate your token | ||
|
|
||
| ```bash | ||
| huggingface-cli login | ||
| ``` | ||
|
|
||
| If you have already cloned the repo, then you won't need to go through these steps. Instead, you can pass the path to your local checkout to the training script and it will be loaded from there. | ||
|
|
||
| ### Dog toy example | ||
|
|
||
| In this example we'll use [these images](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ) to add a new concept to Stable Diffusion using the Dreambooth process. They will be our training data. Please, download them and place them somewhere in your system. | ||
|
|
||
| Then you can launch the training script using: | ||
|
|
||
| ```bash | ||
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | ||
| export INSTANCE_DIR="path_to_training_images" | ||
| export OUTPUT_DIR="path_to_saved_model" | ||
|
|
||
| accelerate launch train_dreambooth.py \ | ||
| --pretrained_model_name_or_path=$MODEL_NAME \ | ||
| --instance_data_dir=$INSTANCE_DIR \ | ||
| --output_dir=$OUTPUT_DIR \ | ||
| --instance_prompt="a photo of sks dog" \ | ||
| --resolution=512 \ | ||
| --train_batch_size=1 \ | ||
| --gradient_accumulation_steps=1 \ | ||
| --learning_rate=5e-6 \ | ||
| --lr_scheduler="constant" \ | ||
| --lr_warmup_steps=0 \ | ||
| --max_train_steps=400 | ||
| ``` | ||
|
|
||
| ### Training with a prior-preserving loss | ||
|
|
||
| Prior preservation is used to avoid overfitting and language-drift. Please, refer to the paper to learn more about it if you are interested. For prior preservation, we use other images of the same class as part of the training process. The nice thing is that we can generate those images using the Stable Diffusion model itself! The training script will save the generated images to a local path we specify. | ||
|
|
||
| According to the paper, it's recommended to generate `num_epochs * num_samples` images for prior preservation. 200-300 works well for most cases. | ||
|
|
||
| ```bash | ||
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | ||
| export INSTANCE_DIR="path_to_training_images" | ||
| export CLASS_DIR="path_to_class_images" | ||
| export OUTPUT_DIR="path_to_saved_model" | ||
|
|
||
| accelerate launch train_dreambooth.py \ | ||
| --pretrained_model_name_or_path=$MODEL_NAME \ | ||
| --instance_data_dir=$INSTANCE_DIR \ | ||
| --class_data_dir=$CLASS_DIR \ | ||
| --output_dir=$OUTPUT_DIR \ | ||
| --with_prior_preservation --prior_loss_weight=1.0 \ | ||
| --instance_prompt="a photo of sks dog" \ | ||
| --class_prompt="a photo of dog" \ | ||
| --resolution=512 \ | ||
| --train_batch_size=1 \ | ||
| --gradient_accumulation_steps=1 \ | ||
| --learning_rate=5e-6 \ | ||
| --lr_scheduler="constant" \ | ||
| --lr_warmup_steps=0 \ | ||
| --num_class_images=200 \ | ||
| --max_train_steps=800 | ||
| ``` | ||
|
|
||
| ### Training on a 16GB GPU | ||
|
|
||
| With the help of gradient checkpointing and the 8-bit optimizer from [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), it's possible to train dreambooth on a 16GB GPU. | ||
|
|
||
| ```bash | ||
| pip install bitsandbytes | ||
| ``` | ||
|
|
||
| Then pass the `--use_8bit_adam` option to the training script. | ||
|
|
||
| ```bash | ||
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | ||
| export INSTANCE_DIR="path_to_training_images" | ||
| export CLASS_DIR="path_to_class_images" | ||
| export OUTPUT_DIR="path_to_saved_model" | ||
|
|
||
| accelerate launch train_dreambooth.py \ | ||
| --pretrained_model_name_or_path=$MODEL_NAME \ | ||
| --instance_data_dir=$INSTANCE_DIR \ | ||
| --class_data_dir=$CLASS_DIR \ | ||
| --output_dir=$OUTPUT_DIR \ | ||
| --with_prior_preservation --prior_loss_weight=1.0 \ | ||
| --instance_prompt="a photo of sks dog" \ | ||
| --class_prompt="a photo of dog" \ | ||
| --resolution=512 \ | ||
| --train_batch_size=1 \ | ||
| --gradient_accumulation_steps=2 --gradient_checkpointing \ | ||
| --use_8bit_adam \ | ||
| --learning_rate=5e-6 \ | ||
| --lr_scheduler="constant" \ | ||
| --lr_warmup_steps=0 \ | ||
| --num_class_images=200 \ | ||
| --max_train_steps=800 | ||
| ``` | ||
|
|
||
| ### Fine-tune the text encoder in addition to the UNet | ||
|
|
||
| The script also allows to fine-tune the `text_encoder` along with the `unet`. It has been observed experimentally that this gives much better results, especially on faces. Please, refer to [our report](https://wandb.ai/psuraj/dreambooth/reports/Dreambooth-Training-Analysis--VmlldzoyNzk0NDc3) for more details. | ||
|
|
||
| To enable this option, pass the `--train_text_encoder` argument to the training script. | ||
|
|
||
| <Tip> | ||
| Training the text encoder requires additional memory, so training won't fit on a 16GB GPU. You'll need at least 24GB VRAM to use this option. | ||
| </Tip> | ||
|
|
||
| ```bash | ||
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | ||
| export INSTANCE_DIR="path_to_training_images" | ||
| export CLASS_DIR="path_to_class_images" | ||
| export OUTPUT_DIR="path_to_saved_model" | ||
|
|
||
| accelerate launch train_dreambooth.py \ | ||
| --pretrained_model_name_or_path=$MODEL_NAME \ | ||
| --train_text_encoder \ | ||
| --instance_data_dir=$INSTANCE_DIR \ | ||
| --class_data_dir=$CLASS_DIR \ | ||
| --output_dir=$OUTPUT_DIR \ | ||
| --with_prior_preservation --prior_loss_weight=1.0 \ | ||
| --instance_prompt="a photo of sks dog" \ | ||
| --class_prompt="a photo of dog" \ | ||
| --resolution=512 \ | ||
| --train_batch_size=1 \ | ||
| --use_8bit_adam | ||
| --gradient_checkpointing \ | ||
| --learning_rate=2e-6 \ | ||
| --lr_scheduler="constant" \ | ||
| --lr_warmup_steps=0 \ | ||
| --num_class_images=200 \ | ||
| --max_train_steps=800 | ||
| ``` | ||
|
|
||
| ### Training on a 8 GB GPU: | ||
|
|
||
| Using [DeepSpeed](https://www.deepspeed.ai/) it's even possible to offload some | ||
| tensors from VRAM to either CPU or NVME, allowing training to proceed with less GPU memory. | ||
|
|
||
| DeepSpeed needs to be enabled with `accelerate config`. During configuration, | ||
| answer yes to "Do you want to use DeepSpeed?". Combining DeepSpeed stage 2, fp16 | ||
| mixed precision, and offloading both the model parameters and the optimizer state to CPU, it's | ||
| possible to train on under 8 GB VRAM. The drawback is that this requires more system RAM (about 25 GB). See [the DeepSpeed documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more configuration options. | ||
|
|
||
| Changing the default Adam optimizer to DeepSpeed's special version of Adam | ||
| `deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup, but enabling | ||
| it requires the system's CUDA toolchain version to be the same as the one installed with PyTorch. 8-bit optimizers don't seem to be compatible with DeepSpeed at the moment. | ||
|
|
||
| ```bash | ||
| export MODEL_NAME="CompVis/stable-diffusion-v1-4" | ||
| export INSTANCE_DIR="path_to_training_images" | ||
| export CLASS_DIR="path_to_class_images" | ||
| export OUTPUT_DIR="path_to_saved_model" | ||
|
|
||
| accelerate launch train_dreambooth.py \ | ||
| --pretrained_model_name_or_path=$MODEL_NAME \ | ||
| --instance_data_dir=$INSTANCE_DIR \ | ||
| --class_data_dir=$CLASS_DIR \ | ||
| --output_dir=$OUTPUT_DIR \ | ||
| --with_prior_preservation --prior_loss_weight=1.0 \ | ||
| --instance_prompt="a photo of sks dog" \ | ||
| --class_prompt="a photo of dog" \ | ||
| --resolution=512 \ | ||
| --train_batch_size=1 \ | ||
| --sample_batch_size=1 \ | ||
| --gradient_accumulation_steps=1 --gradient_checkpointing \ | ||
| --learning_rate=5e-6 \ | ||
| --lr_scheduler="constant" \ | ||
| --lr_warmup_steps=0 \ | ||
| --num_class_images=200 \ | ||
| --max_train_steps=800 \ | ||
| --mixed_precision=fp16 | ||
| ``` | ||
|
|
||
| ## Inference | ||
|
|
||
| Once you have trained a model, inference can be done using the `StableDiffusionPipeline`, by simply indicating the path where the model was saved. Make sure that your prompts include the special `identifier` used during training (`sks` in the previous examples). | ||
|
|
||
| ```python | ||
| from diffusers import StableDiffusionPipeline | ||
| import torch | ||
|
|
||
| model_id = "path_to_saved_model" | ||
| pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") | ||
|
|
||
| prompt = "A photo of sks dog in a bucket" | ||
| image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] | ||
|
|
||
| image.save("dog-bucket.png") | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License. | |
|
|
||
| # 🧨 Diffusers Training Examples | ||
|
|
||
| Diffusers examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library | ||
| Diffusers training examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library | ||
| for a variety of use cases. | ||
|
|
||
| **Note**: If you are looking for **official** examples on how to use `diffusers` for inference, | ||
|
|
@@ -36,13 +36,15 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie | |
| - [Unconditional Training](./unconditional_training) | ||
| - [Text-to-Image Training](./text2image) | ||
| - [Text Inversion](./text_inversion) | ||
| - [Dreambooth](./dreambooth) | ||
|
|
||
|
|
||
| | Task | 🤗 Accelerate | 🤗 Datasets | Colab | ||
| |---|---|:---:|:---:| | ||
| | [**Unconditional Image Generation**](./unconditional_training) | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) | ||
| | [**Text-to-Image**](./text2image) | - | - | | ||
| | [**Text-Inversion**](./text_inversion) | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | ||
| | [**Text-to-Image fine-tuning**](./text2image) | ✅ | ✅ | | ||
| | [**Textual Inversion**](./text_inversion) | ✅ | - | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are not using
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I noticed and removed the green tick. The remaining one is for Accelerate. |
||
| | [**Dreambooth**](./dreambooth) | ✅ | - | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) | ||
|
|
||
| ## Community | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!