Skip to content

Enable Training on 6GB Cards... with DeepSpeed? #63

@brucethemoose

Description

@brucethemoose

I am trying to squeeze training onto my 6GB laptop RTX 2060, and cant quite manage it with the "low memory" config:

accelerate launch --num_cpu_threads_per_process 8 train_db.py \
--pretrained_model_name_or_path="/home/alpha/Storage/AIModels/Stable-diffusion/panatomy05full_0.7-AIModels_Anything-V3.0-pruned-fp16_0.3-Weighted_sum-merged.ckpt" \
--train_data_dir="/home/alpha/Storage/TrainingData/test/training_data" \
--output_dir="/home/alpha/Storage/TrainingOutput/test/" \
--prior_loss_weight=1.0 \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=1e-6 \
--max_train_steps=1600 \
--use_8bit_adam \
--xformers \
--mixed_precision="fp16" \
--cache_latents \
--gradient_checkpointing \
--save_precision="fp16" \
--full_fp16 \
--save_model_as="safetensors" \

So, I figured I would investigate Deepspeed cpu offloading with the accelerate config... but I keep running into errors on both the git version and the 0.7.7 release from pypi. Here is an error from the pypi release:

Traceback (most recent call last):
  File "/home/alpha/clone/sd-scripts/train_db.py", line 332, in <module>
    train(args)
  File "/home/alpha/clone/sd-scripts/train_db.py", line 154, in train
    unet, text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 619, in prepare
    result = self._prepare_deepspeed(*args)
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 805, in _prepare_deepspeed
    engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 330, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1210, in _configure_optimizer
    self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1455, in _configure_zero_optimizer
    optimizer = DeepSpeedZeroOptimizer(
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 532, in __init__
    self._param_slice_mappings = self._create_param_mapping()
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 544, in _create_param_mapping
    lp_name = self.param_names[lp]
KeyError: <exception str() failed>
[2023-01-12 13:13:52,241] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 5398
[2023-01-12 13:13:52,244] [ERROR] [launch.py:324:sigkill_handler] ['/usr/bin/python', '-u', 'train_db.py', '--pretrained_model_name_or_path=/home/alpha/Storage/AIModels/Stable-diffusion/panatomy05full_0.7-AIModels_Anything-V3.0-pruned-fp16_0.3-Weighted_sum-merged.ckpt', '--train_data_dir=/home/alpha/Storage/TrainingData/test/training_data', '--output_dir=/home/alpha/Storage/TrainingOutput/test/', '--prior_loss_weight=1.0', '--resolution=512', '--train_batch_size=1', '--learning_rate=1e-6', '--max_train_steps=1600', '--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--cache_latents', '--gradient_checkpointing', '--save_precision=fp16', '--full_fp16', '--save_model_as=safetensors'] exits with return code = 1
Traceback (most recent call last):
  File "/home/alpha/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 827, in launch_command
    deepspeed_launcher(args)
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 540, in deepspeed_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['deepspeed', '--no_local_rank', '--num_gpus', '1', 'train_db.py', '--pretrained_model_name_or_path=/home/alpha/Storage/AIModels/Stable-diffusion/panatomy05full_0.7-AIModels_Anything-V3.0-pruned-fp16_0.3-Weighted_sum-merged.ckpt', '--train_data_dir=/home/alpha/Storage/TrainingData/test/training_data', '--output_dir=/home/alpha/Storage/TrainingOutput/test/', '--prior_loss_weight=1.0', '--resolution=512', '--train_batch_size=1', '--learning_rate=1e-6', '--max_train_steps=1600', '--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--cache_latents', '--gradient_checkpointing', '--save_precision=fp16', '--full_fp16', '--save_model_as=safetensors']' returned non-zero exit status 1.

Is there anything in particular that needs to be changed for this repo to support deepspeed? Or maybe there is some other tweak to squeeze LORA onto 6GB?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions