support ColossalAI

Fazziekey · Fazziekey · commit f9a1031615a4 · 2023-01-04T14:18:20.000+08:00
diff --git a/examples/dreambooth/README.md b/examples/dreambooth/README.md
@@ -171,6 +171,32 @@ accelerate launch --mixed_precision="fp16" train_dreambooth.py \
   --max_train_steps=800
 ```
 
+### Training on a 6 GB GPU:
+
+By accommodating model data in CPU and GPU and moving the data to the computing device when necessary, [Gemini](https://www.colossalai.org/docs/advanced_tutorials/meet_gemini), the Heterogeneous Memory Manager of [Colossal-AI](https://github.com/hpcaitech/ColossalAI) can breakthrough the GPU memory wall by using GPU and CPU memory (composed of CPU DRAM or nvme SSD memory) together at the same time. Moreover, the model scale can be further improved by combining heterogeneous training with the other parallel approaches, such as data parallel, tensor parallel and pipeline parallel .
+
+The arguement `placement` can be `cpu`, `auto`, `cuda`, with `cpu` the GPU RAM required can be minimized to 6GB but will deceleration, with `cuda` you can also reduce GPU memory by half but accelerated training， with `auto` a more balanced solution for speed and memory can be obtained。
+
+```bash
+export MODEL_NAME="CompVis/stable-diffusion-v1-4"
+export INSTANCE_DIR="path-to-instance-images"
+export OUTPUT_DIR="path-to-save-model"
+
+torchrun --nproc_per_node 2 train_dreambooth_colossalai.py \
+  --pretrained_model_name_or_path=$MODEL_NAME  \
+  --instance_data_dir=$INSTANCE_DIR \
+  --output_dir=$OUTPUT_DIR \
+  --instance_prompt="a photo of a dog" \
+  --resolution=512 \
+  --train_batch_size=1 \
+  --learning_rate=5e-6 \
+  --lr_scheduler="constant" \
+  --lr_warmup_steps=0 \
+  --num_class_images=200 \
+  --max_train_steps=800 \
+  --placement="cuda"
+```
+
 ### Fine-tune text encoder with the UNet.
 
 The script also allows to fine-tune the `text_encoder` along with the `unet`. It's been observed experimentally that fine-tuning `text_encoder` gives much better results especially on faces. 
diff --git a/examples/dreambooth/requirement_colossalai.txt b/examples/dreambooth/requirement_colossalai.txt
@@ -0,0 +1,8 @@
+diffusers
+torch
+torchvision
+ftfy
+tensorboard
+modelcards
+transformers
+colossalai==0.1.11rc5+torch1.12cu11.3 -f https://release.colossalai.org
diff --git a/examples/dreambooth/train_dreambooth_colossalai.py b/examples/dreambooth/train_dreambooth_colossalai.py