-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Closed
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates
Description
Describe the bug
In train_dreambooth_lora.py we still have accelerator.load_state(os.path.join(args.output_dir, path)) for resuming from a checkpoint, while we are not saving the state. This results in the following error: FileNotFoundError: [Errno 2] No such file or directory:
'/home/ec2-user/ssl/Jupyter/exp/models/lora/checkpoint-250/pytorch_model.bin'.
Reproduction
Resuming from a checkpoint from the train_dreambooth_lora.py would reproduce the error.
train_dreambooth_lora.py --resume_from_checkpoint path
Logs
error: FileNotFoundError: [Errno 2] No such file or directory:
'/home/ec2-user/ssl/Jupyter/exp/models/lora/checkpoint-250/pytorch_model.bin'.System Info
diffusersversion: 0.16.1- Platform: Linux-4.14.301-224.520.amzn2.x86_64-x86_64-with-glibc2.26
- Python version: 3.10.9
- PyTorch version (GPU?): 1.13.1+cu117 (True)
- Huggingface_hub version: 0.14.1
- Transformers version: 4.27.3
- Accelerate version: 0.18.0
- xFormers version: 0.0.16
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: No
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates