Skip to content

Commit f9a1031

Browse files
committed
support ColossalAI
1 parent 4125756 commit f9a1031

File tree

3 files changed

+736
-0
lines changed

3 files changed

+736
-0
lines changed

examples/dreambooth/README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,32 @@ accelerate launch --mixed_precision="fp16" train_dreambooth.py \
171171
--max_train_steps=800
172172
```
173173

174+
### Training on a 6 GB GPU:
175+
176+
By accommodating model data in CPU and GPU and moving the data to the computing device when necessary, [Gemini](https://www.colossalai.org/docs/advanced_tutorials/meet_gemini), the Heterogeneous Memory Manager of [Colossal-AI](https://github.com/hpcaitech/ColossalAI) can breakthrough the GPU memory wall by using GPU and CPU memory (composed of CPU DRAM or nvme SSD memory) together at the same time. Moreover, the model scale can be further improved by combining heterogeneous training with the other parallel approaches, such as data parallel, tensor parallel and pipeline parallel .
177+
178+
The arguement `placement` can be `cpu`, `auto`, `cuda`, with `cpu` the GPU RAM required can be minimized to 6GB but will deceleration, with `cuda` you can also reduce GPU memory by half but accelerated training, with `auto` a more balanced solution for speed and memory can be obtained。
179+
180+
```bash
181+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
182+
export INSTANCE_DIR="path-to-instance-images"
183+
export OUTPUT_DIR="path-to-save-model"
184+
185+
torchrun --nproc_per_node 2 train_dreambooth_colossalai.py \
186+
--pretrained_model_name_or_path=$MODEL_NAME \
187+
--instance_data_dir=$INSTANCE_DIR \
188+
--output_dir=$OUTPUT_DIR \
189+
--instance_prompt="a photo of a dog" \
190+
--resolution=512 \
191+
--train_batch_size=1 \
192+
--learning_rate=5e-6 \
193+
--lr_scheduler="constant" \
194+
--lr_warmup_steps=0 \
195+
--num_class_images=200 \
196+
--max_train_steps=800 \
197+
--placement="cuda"
198+
```
199+
174200
### Fine-tune text encoder with the UNet.
175201

176202
The script also allows to fine-tune the `text_encoder` along with the `unet`. It's been observed experimentally that fine-tuning `text_encoder` gives much better results especially on faces.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
diffusers
2+
torch
3+
torchvision
4+
ftfy
5+
tensorboard
6+
modelcards
7+
transformers
8+
colossalai==0.1.11rc5+torch1.12cu11.3 -f https://release.colossalai.org

0 commit comments

Comments
 (0)