CUDA out of memory when i want to train dreambooth  

### Describe the bug

I'm using T4 with colab free , when I start training it tells me cuda error, it happens when I activate prior_preservation.

![image](https://user-images.githubusercontent.com/15265895/193465676-026575bb-0336-4ed1-9def-6a88c6599793.png)

Run training


```
Launching training on one GPU.
Steps: 0%
1/450 [00:10<1:20:12, 10.72s/it, loss=0.0338]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-c6e3ce5f5a40> in <module>
      1 #@title Run training
      2 import accelerate
----> 3 accelerate.notebook_launcher(training_function, args=(text_encoder, vae, unet))
      4 with torch.no_grad():
      5     torch.cuda.empty_cache()

7 frames
/usr/local/lib/python3.7/dist-packages/accelerate/launchers.py in notebook_launcher(function, args, num_processes, use_fp16, mixed_precision, use_port)
     81             else:
     82                 print("Launching training on one CPU.")
---> 83             function(*args)
     84 
     85     else:

<ipython-input-1-d9553ec566fc> in training_function(text_encoder, vae, unet)
    364                     loss = F.mse_loss(noise_pred, noise, reduction="none").mean([1, 2, 3]).mean()
    365 
--> 366                 accelerator.backward(loss)
    367                 accelerator.clip_grad_norm_(unet.parameters(), args.max_grad_norm)
    368                 optimizer.step()

/usr/local/lib/python3.7/dist-packages/accelerate/accelerator.py in backward(self, loss, **kwargs)
    882             self.scaler.scale(loss).backward(**kwargs)
    883         else:
--> 884             loss.backward(**kwargs)
    885 
    886     def unscale_gradients(self, optimizer=None):

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    394                 create_graph=create_graph,
    395                 inputs=inputs)
--> 396         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    397 
    398     def register_hook(self, hook):

/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    173     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 175         allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
    176 
    177 def grad(

/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py in apply(self, *args)
    251                                "of them.")
    252         user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn
--> 253         return user_fn(self, *args)
    254 
    255     def apply_jvp(self, *args):

/usr/local/lib/python3.7/dist-packages/torch/utils/checkpoint.py in backward(ctx, *args)
    144                 "none of output has requires_grad=True,"
    145                 " this checkpoint() is not necessary")
--> 146         torch.autograd.backward(outputs_with_grad, args_with_grad)
    147         grads = tuple(inp.grad if isinstance(inp, torch.Tensor) else None
    148                       for inp in detached_inputs)

/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    173     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 175         allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
    176 
    177 def grad(

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 14.76 GiB total capacity; 12.24 GiB already allocated; 877.75 MiB free; 12.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
```

### Reproduction

_No response_

### Logs

_No response_

### System Info

T4 with colab free

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA out of memory when i want to train dreambooth #696

Describe the bug

Reproduction

Logs

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA out of memory when i want to train dreambooth #696

Description

Describe the bug

Reproduction

Logs

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions