-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Closed
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates
Description
Describe the bug
I trained dreambooth with lora and sd-xl for 1000 steps, then I try to continue traning resume from the 500th step, however, it seems like the training starts without the 1000's checkpoint, i.e. it starts from the beginning. Training scripts are as below.
btw. I fix another resume bug as advised in #5004.
could you please help it?
Reproduction
export MODEL_NAME="/data/model/stable-diffusion-xl-base-1.0/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="/data/datasets/image_instance/chair_crop"
export OUTPUT_DIR="/data/lora-xl/diffusers/examples/dreambooth/lora-trained-xl_1e-6_step1500_chair_crop_per25"
export VAE_PATH="/data/model/stable-diffusion-xl-base-1.0/sdxl-vae-fp16-fix"
accelerate launch train_dreambooth_lora_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--pretrained_vae_model_name_or_path=$VAE_PATH \
--output_dir=$OUTPUT_DIR \
--mixed_precision="fp16" \
--instance_prompt="a photo of sks chair" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--learning_rate=1e-5 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=2700 \
--validation_prompt="A photo of sks chair on grass" \
--validation_epochs=25 \
--seed="0" \
--checkpointing_steps=500 \
--resume_from_checkpoint="/data/xuyu/lora-xl/diffusers/examples/dreambooth/lora-trained-xl_1e-6_step1500_chair_crop_per25/checkpoint-1000"
Logs
No response
System Info
diffusersversion: 0.22.0.dev0- Platform: Linux-3.10.0-1160.95.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.10.13
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Huggingface_hub version: 0.17.3
- Transformers version: 4.34.0
- Accelerate version: 0.23.0
- xFormers version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: parallel
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates