Skip to content

Commit 4bae76e

Browse files
authored
[docs] Improve LoRA docs (#3311)
* update docs * add to toctree * apply feedback
1 parent 0224794 commit 4bae76e

File tree

10 files changed

+128
-106
lines changed

10 files changed

+128
-106
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@
6060
- sections:
6161
- local: training/overview
6262
title: Overview
63+
- local: training/create_dataset
64+
title: Create a dataset for training
6365
- local: training/unconditional_training
6466
title: Unconditional image generation
6567
- local: training/text_inversion

docs/source/en/training/controlnet.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@ The original dataset is hosted in the ControlNet [repo](https://huggingface.co/l
6969

7070
Our training examples use [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) because that is what the original set of ControlNet models was trained on. However, ControlNet can be trained to augment any compatible Stable Diffusion model (such as [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)) or [`stabilityai/stable-diffusion-2-1`](https://huggingface.co/stabilityai/stable-diffusion-2-1).
7171

72+
To use your own dataset, take a look at the [Create a dataset for training](create_dataset) guide.
73+
7274
## Training
7375

7476
Download the following images to condition our training with:
@@ -79,7 +81,9 @@ wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/ma
7981
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png
8082
```
8183

82-
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
84+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) argument.
85+
86+
The training script creates and saves a `diffusion_pytorch_model.bin` file in your repository.
8387

8488
```bash
8589
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Create a dataset for training
2+
3+
There are many datasets on the [Hub](https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads) to train a model on, but if you can't find one you're interested in or want to use your own, you can create a dataset with the 🤗 [Datasets](hf.co/docs/datasets) library. The dataset structure depends on the task you want to train your model on. The most basic dataset structure is a directory of images for tasks like unconditional image generation. Another dataset structure may be a directory of images and a text file containing their corresponding text captions for tasks like text-to-image generation.
4+
5+
This guide will show you two ways to create a dataset to finetune on:
6+
7+
- provide a folder of images to the `--train_data_dir` argument
8+
- upload a dataset to the Hub and pass the dataset repository id to the `--dataset_name` argument
9+
10+
<Tip>
11+
12+
💡 Learn more about how to create an image dataset for training in the [Create an image dataset](https://huggingface.co/docs/datasets/image_dataset) guide.
13+
14+
</Tip>
15+
16+
## Provide a dataset as a folder
17+
18+
For unconditional generation, you can provide your own dataset as a folder of images. The training script uses the [`ImageFolder`](https://huggingface.co/docs/datasets/en/image_dataset#imagefolder) builder from 🤗 Datasets to automatically build a dataset from the folder. Your directory structure should look like:
19+
20+
```bash
21+
data_dir/xxx.png
22+
data_dir/xxy.png
23+
data_dir/[...]/xxz.png
24+
```
25+
26+
Pass the path to the dataset directory to the `--train_data_dir` argument, and then you can start training:
27+
28+
```bash
29+
accelerate launch train_unconditional.py \
30+
--train_data_dir <path-to-train-directory> \
31+
<other-arguments>
32+
```
33+
34+
## Upload your data to the Hub
35+
36+
<Tip>
37+
38+
💡 For more details and context about creating and uploading a dataset to the Hub, take a look at the [Image search with 🤗 Datasets](https://huggingface.co/blog/image-search-datasets) post.
39+
40+
</Tip>
41+
42+
Start by creating a dataset with the [`ImageFolder`](https://huggingface.co/docs/datasets/image_load#imagefolder) feature, which creates an `image` column containing the PIL-encoded images.
43+
44+
You can use the `data_dir` or `data_files` parameters to specify the location of the dataset. The `data_files` parameter supports mapping specific files to dataset splits like `train` or `test`:
45+
46+
```python
47+
from datasets import load_dataset
48+
49+
# example 1: local folder
50+
dataset = load_dataset("imagefolder", data_dir="path_to_your_folder")
51+
52+
# example 2: local files (supported formats are tar, gzip, zip, xz, rar, zstd)
53+
dataset = load_dataset("imagefolder", data_files="path_to_zip_file")
54+
55+
# example 3: remote files (supported formats are tar, gzip, zip, xz, rar, zstd)
56+
dataset = load_dataset(
57+
"imagefolder",
58+
data_files="https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip",
59+
)
60+
61+
# example 4: providing several splits
62+
dataset = load_dataset(
63+
"imagefolder", data_files={"train": ["path/to/file1", "path/to/file2"], "test": ["path/to/file3", "path/to/file4"]}
64+
)
65+
```
66+
67+
Then use the [`~datasets.Dataset.push_to_hub`] method to upload the dataset to the Hub:
68+
69+
```python
70+
# assuming you have ran the huggingface-cli login command in a terminal
71+
dataset.push_to_hub("name_of_your_dataset")
72+
73+
# if you want to push to a private repo, simply pass private=True:
74+
dataset.push_to_hub("name_of_your_dataset", private=True)
75+
```
76+
77+
Now the dataset is available for training by passing the dataset name to the `--dataset_name` argument:
78+
79+
```bash
80+
accelerate launch --mixed_precision="fp16" train_text_to_image.py \
81+
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
82+
--dataset_name="name_of_your_dataset" \
83+
<other-arguments>
84+
```
85+
86+
## Next steps
87+
88+
Now that you've created a dataset, you can plug it into the `train_data_dir` (if your dataset is local) or `dataset_name` (if your dataset is on the Hub) arguments of a training script.
89+
90+
For your next steps, feel free to try and use your dataset to train a model for [unconditional generation](uncondtional_training) or [text-to-image generation](text2image)!

docs/source/en/training/custom_diffusion.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ write_basic_config()
6767
```
6868
### Cat example 😺
6969

70-
Now let's get our dataset. Download dataset from [here](https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip) and unzip it.
70+
Now let's get our dataset. Download dataset from [here](https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip) and unzip it. To use your own dataset, take a look at the [Create a dataset for training](create_dataset) guide.
7171

7272
We also collect 200 real images using `clip-retrieval` which are combined with the target images in the training dataset as a regularization. This prevents overfitting to the the given target image. The following flags enable the regularization `with_prior_preservation`, `real_prior` with `prior_loss_weight=1.`.
7373
The `class_prompt` should be the category name same as target image. The collected real images are with text captions similar to the `class_prompt`. The retrieved image are saved in `class_data_dir`. You can disable `real_prior` to use generated images as regularization. To collect the real images use this command first before training.
@@ -79,6 +79,8 @@ python retrieve.py --class_prompt cat --class_data_dir real_reg/samples_cat --nu
7979

8080
**___Note: Change the `resolution` to 768 if you are using the [stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) 768x768 model.___**
8181

82+
The script creates and saves model checkpoints and a `pytorch_custom_diffusion_weights.bin` file in your repository.
83+
8284
```bash
8385
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
8486
export OUTPUT_DIR="path-to-save-model"

docs/source/en/training/dreambooth.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ snapshot_download(
6464
)
6565
```
6666

67+
To use your own dataset, take a look at the [Create a dataset for training](create_dataset) guide.
68+
6769
## Finetuning
6870

6971
<Tip warning={true}>
@@ -76,7 +78,7 @@ DreamBooth finetuning is very sensitive to hyperparameters and easy to overfit.
7678
<pt>
7779
Set the `INSTANCE_DIR` environment variable to the path of the directory containing the dog images.
7880

79-
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
81+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`] argument. The `instance_prompt` argument is a text prompt that contains a unique identifier, such as `sks`, and the class the image belongs to, which in this example is `a photo of a sks dog`.
8082

8183
```bash
8284
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
@@ -111,7 +113,7 @@ Before running the script, make sure you have the requirements installed:
111113
pip install -U -r requirements.txt
112114
```
113115

114-
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
116+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`] argument. The `instance_prompt` argument is a text prompt that contains a unique identifier, such as `sks`, and the class the image belongs to, which in this example is `a photo of a sks dog`.
115117

116118
Now you can launch the training script with the following command:
117119

docs/source/en/training/instructpix2pix.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -77,16 +77,16 @@ write_basic_config()
7777
### Toy example
7878

7979
As mentioned before, we'll use a [small toy dataset](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples) for training. The dataset
80-
is a smaller version of the [original dataset](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered) used in the InstructPix2Pix paper.
80+
is a smaller version of the [original dataset](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered) used in the InstructPix2Pix paper. To use your own dataset, take a look at the [Create a dataset for training](create_dataset) guide.
8181

82-
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument. You'll also need to specify the dataset name in `DATASET_ID`:
82+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) argument. You'll also need to specify the dataset name in `DATASET_ID`:
8383

8484
```bash
8585
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
8686
export DATASET_ID="fusing/instructpix2pix-1000-samples"
8787
```
8888

89-
Now, we can launch training:
89+
Now, we can launch training. The script saves all the components (`feature_extractor`, `scheduler`, `text_encoder`, `unet`, etc) in a subfolder in your repository.
9090

9191
```bash
9292
accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \

docs/source/en/training/lora.mdx

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,7 @@ specific language governing permissions and limitations under the License.
1717
<Tip warning={true}>
1818

1919
Currently, LoRA is only supported for the attention layers of the [`UNet2DConditionalModel`]. We also
20-
support LoRA fine-tuning of the text encoder for DreamBooth in a limited capacity. For more details on how we support
21-
LoRA fine-tuning of the text encoder, refer to the discussion on [this PR](https://github.com/huggingface/diffusers/pull/2918).
20+
support fine-tuning the text encoder for DreamBooth with LoRA in a limited capacity. Fine-tuning the text encoder for DreamBooth generally yields better results, but it can increase compute usage.
2221

2322
</Tip>
2423

@@ -52,7 +51,7 @@ Finetuning a model like Stable Diffusion, which has billions of parameters, can
5251

5352
Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon.
5453

55-
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument. You'll also need to set the `DATASET_NAME` environment variable to the name of the dataset you want to train on.
54+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) argument. You'll also need to set the `DATASET_NAME` environment variable to the name of the dataset you want to train on. To use your own dataset, take a look at the [Create a dataset for training](create_dataset) guide.
5655

5756
The `OUTPUT_DIR` and `HUB_MODEL_ID` variables are optional and specify where to save the model to on the Hub:
5857

@@ -69,7 +68,7 @@ There are some flags to be aware of before you start training:
6968
* `--report_to=wandb` reports and logs the training results to your Weights & Biases dashboard (as an example, take a look at this [report](https://wandb.ai/pcuenq/text2image-fine-tune/runs/b4k1w0tn?workspace=user-pcuenq)).
7069
* `--learning_rate=1e-04`, you can afford to use a higher learning rate than you normally would with LoRA.
7170

72-
Now you're ready to launch the training (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)):
71+
Now you're ready to launch the training (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)). Training takes about 5 hours on a 2080 Ti GPU with 11GB of RAM, and it'll create and save model checkpoints and the `pytorch_lora_weights` in your repository.
7372

7473
```bash
7574
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
@@ -159,9 +158,9 @@ pipe = StableDiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.
159158

160159
### Training[[dreambooth-training]]
161160

162-
Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) with DreamBooth and LoRA with some 🐶 [dog images](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ). Download and save these images to a directory.
161+
Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) with DreamBooth and LoRA with some 🐶 [dog images](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ). Download and save these images to a directory. To use your own dataset, take a look at the [Create a dataset for training](create_dataset) guide.
163162

164-
To start, specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument. You'll also need to set `INSTANCE_DIR` to the path of the directory containing the images.
163+
To start, specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) argument. You'll also need to set `INSTANCE_DIR` to the path of the directory containing the images.
165164

166165
The `OUTPUT_DIR` variables is optional and specifies where to save the model to on the Hub:
167166

@@ -177,7 +176,11 @@ There are some flags to be aware of before you start training:
177176
* `--report_to=wandb` reports and logs the training results to your Weights & Biases dashboard (as an example, take a look at this [report](https://wandb.ai/pcuenq/text2image-fine-tune/runs/b4k1w0tn?workspace=user-pcuenq)).
178177
* `--learning_rate=1e-04`, you can afford to use a higher learning rate than you normally would with LoRA.
179178

180-
Now you're ready to launch the training (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py)):
179+
Now you're ready to launch the training (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py)). The script creates and saves model checkpoints and the `pytorch_lora_weights.bin` file in your repository.
180+
181+
It's also possible to additionally fine-tune the text encoder with LoRA. This, in most cases, leads
182+
to better results with a slight increase in the compute. To allow fine-tuning the text encoder with LoRA,
183+
specify the `--train_text_encoder` while launching the `train_dreambooth_lora.py` script.
181184

182185
```bash
183186
accelerate launch train_dreambooth_lora.py \
@@ -198,12 +201,7 @@ accelerate launch train_dreambooth_lora.py \
198201
--validation_epochs=50 \
199202
--seed="0" \
200203
--push_to_hub
201-
```
202-
203-
It's also possible to additionally fine-tune the text encoder with LoRA. This, in most cases, leads
204-
to better results with a slight increase in the compute. To allow fine-tuning the text encoder with LoRA,
205-
specify the `--train_text_encoder` while launching the `train_dreambooth_lora.py` script.
206-
204+
```
207205

208206
### Inference[[dreambooth-inference]]
209207

docs/source/en/training/text2image.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ To load a checkpoint to resume training, pass the argument `--resume_from_checkp
7474
<pt>
7575
Launch the [PyTorch training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) for a fine-tuning run on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset like this.
7676

77-
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
77+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) argument.
7878

7979
<literalinclude>
8080
{"path": "../../../../examples/text_to_image/README.md",
@@ -143,7 +143,7 @@ Before running the script, make sure you have the requirements installed:
143143
pip install -U -r requirements_flax.txt
144144
```
145145

146-
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`~diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path`] argument.
146+
Specify the `MODEL_NAME` environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) argument.
147147

148148
Now you can launch the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_flax.py) like this:
149149

0 commit comments

Comments
 (0)