[Dreambooth] Precision mismatch when generating class images

### Describe the bug

Hello!

I was trying to finetune Anything 3.0 and when I was generating the class images I got precision mismatch error (see log below). I already had float16 errors on my GPU, so I set `mixed_precision` to false and it worked with *pre-generated* class images, but not when the pipeline generates them in the `train_dreambooth.py` script.

I already found the place which is causing the issue in the code and I'll make a Pull-Request today.


### Reproduction

1. Have a GPU which is having trouble with half precision
2. Download Anything-3.0's **diffusers** branch, so that the Diffusers Dreambooth pipeline would work on it `git clone --depth=1 -b diffusers https://huggingface.co/Linaqruf/anything-v3.0`
3. Start the Dreambooth script with `--with_prior_preservation`
4. Get `RuntimeError: expected scalar type Half but found Float` when generating class images

```sh
export MODEL_NAME="/home/{USER}/kml/models1"
export INSTANCE_DIR="/home/{USER}/kml/datasets/objects/alhaitham"
export CLASS_DIR="/home/{USER}/kml/datasets/objects/alhaitham1"
export OUTPUT_DIR="/home/{USER}/kml/models2"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="character portrait of alhaitham" \
  --class_prompt="1boy, medium hair, grey hair, green eyes, bishounen, colorful, autumn, green leaves, detailed fantasy>  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800 \
  --mixed_precision 'no' \
  --train_text_encoder
```

### Logs

```shell
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Generating class images:   0%|                                                                   | 0/50 [00:00<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/{USER}/kml/diffusers/examples/dreambooth/train_dreambooth.py:779 in <module>              │
│                                                                                                  │
│   776                                                                                            │
│   777 if __name__ == "__main__":                                                                 │
│   778 │   args = parse_args()                                                                    │
│ ❱ 779 │   main(args)                                                                             │
│   780                                                                                            │
│                                                                                                  │
│ /home/{USER}/kml/diffusers/examples/dreambooth/train_dreambooth.py:456 in main                  │
│                                                                                                  │
│   453 │   │   │   for example in tqdm(                                                           │
│   454 │   │   │   │   sample_dataloader, desc="Generating class images", disable=not accelerat   │
│   455 │   │   │   ):                                                                             │
│ ❱ 456 │   │   │   │   images = pipeline(example["prompt"]).images                                │
│   457 │   │   │   │                                                                              │
│   458 │   │   │   │   for i, image in enumerate(images):                                         │
│   459 │   │   │   │   │   hash_image = hashlib.sha1(image.tobytes()).hexdigest()                 │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py:27 in               │
│ decorate_context                                                                                 │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /home/{USER}/kml/src/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusi │
│ on.py:496 in __call__                                                                            │
│                                                                                                  │
│   493 │   │   do_classifier_free_guidance = guidance_scale > 1.0                                 │
│   494 │   │                                                                                      │
│   495 │   │   # 3. Encode input prompt                                                           │
│ ❱ 496 │   │   text_embeddings = self._encode_prompt(                                             │
│   497 │   │   │   prompt, device, num_images_per_prompt, do_classifier_free_guidance, negative   │
│   498 │   │   )                                                                                  │
│   499                                                                                            │
│                                                                                                  │
│ /home/{USER}/kml/src/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusi │
│ on.py:265 in _encode_prompt                                                                      │
│                                                                                                  │
│   262 │   │   else:                                                                              │
│   263 │   │   │   attention_mask = None                                                          │
│   264 │   │                                                                                      │
│ ❱ 265 │   │   text_embeddings = self.text_encoder(                                               │
│   266 │   │   │   text_input_ids.to(device),                                                     │
│   267 │   │   │   attention_mask=attention_mask,                                                 │
│   268 │   │   )                                                                                  │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in _call_impl   │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:722   │
│ in forward                                                                                       │
│                                                                                                  │
│    719 │   │   >>> last_hidden_state = outputs.last_hidden_state                                 │
│    720 │   │   >>> pooled_output = outputs.pooler_output  # pooled (EOS token) states            │
│    721 │   │   """                                                                            │
│ ❱  722 │   │   return self.text_model(                                                           │
│    723 │   │   │   input_ids=input_ids,                                                          │
│    724 │   │   │   attention_mask=attention_mask,                                                │
│    725 │   │   │   position_ids=position_ids,                                                    │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in _call_impl   │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:643   │
│ in forward                                                                                       │
│                                                                                                  │
│    640 │   │   │   # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]                        │
│    641 │   │   │   attention_mask = _expand_mask(attention_mask, hidden_states.dtype)            │
│    642 │   │                                                                                     │
│ ❱  643 │   │   encoder_outputs = self.encoder(                                                   │
│    644 │   │   │   inputs_embeds=hidden_states,                                                  │
│    645 │   │   │   attention_mask=attention_mask,                                                │
│    646 │   │   │   causal_attention_mask=causal_attention_mask,                                  │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in _call_impl   │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:574   │
│ in forward                                                                                       │
│                                                                                                  │
│    571 │   │   │   │   │   causal_attention_mask,                                                │
│    572 │   │   │   │   )                                                                         │
│    573 │   │   │   else:                                                                         │
│ ❱  574 │   │   │   │   layer_outputs = encoder_layer(                                            │
│    575 │   │   │   │   │   hidden_states,                                                        │
│    576 │   │   │   │   │   attention_mask,                                                       │
│    577 │   │   │   │   │   causal_attention_mask,                                                │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in _call_impl   │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:317   │
│ in forward                                                                                       │
│                                                                                                  │
│    314 │   │   residual = hidden_states                                                          │
│    315 │   │                                                                                     │
│    316 │   │   hidden_states = self.layer_norm1(hidden_states)                                   │
│ ❱  317 │   │   hidden_states, attn_weights = self.self_attn(                                     │
│    318 │   │   │   hidden_states=hidden_states,                                                  │
│    319 │   │   │   attention_mask=attention_mask,                                                │
│    320 │   │   │   causal_attention_mask=causal_attention_mask,                                  │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/torch/nn/modules/module.py:1130 in _call_impl   │
│                                                                                                  │
│   1127 │   │   # this function, and just call forward.                                           │
│   1128 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1129 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1130 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1131 │   │   # Do not call functions when jit is used                                          │
│   1132 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1133 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/{USER}/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:257   │
│ in forward                                                                                       │
│                                                                                                  │
│    254 │   │                                                                                     │
│    255 │   │   attn_probs = nn.functional.dropout(attn_weights, p=self.dropout, training=self.t  │
│    256 │   │                                                                                     │
│ ❱  257 │   │   attn_output = torch.bmm(attn_probs, value_states)                                 │
│    258 │   │                                                                                     │
│    259 │   │   if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):          │
│    260 │   │   │   raise ValueError(                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: expected scalar type Half but found Float
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/{USER}/.local/bin/accelerate:33 in <module>                                               │
│                                                                                                  │
│   30                                                                                             │
│   31 if __name__ == '__main__':                                                                  │
│   32 │   sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])                       │
│ ❱ 33 │   sys.exit(load_entry_point('accelerate', 'console_scripts', 'accelerate')())             │
│   34                                                                                             │
│                                                                                                  │
│ /home/{USER}/kml/src/accelerate/src/accelerate/commands/accelerate_cli.py:45 in main            │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home/{USER}/kml/src/accelerate/src/accelerate/commands/launch.py:1071 in launch_command        │
│                                                                                                  │
│   1068 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA  │
│   1069 │   │   sagemaker_launcher(defaults, args)                                                │
│   1070 │   else:                                                                                 │
│ ❱ 1071 │   │   simple_launcher(args)                                                             │
│   1072                                                                                           │
│   1073                                                                                           │
│   1074 def main():                                                                               │
│                                                                                                  │
│ /home/{USER}/kml/src/accelerate/src/accelerate/commands/launch.py:547 in simple_launcher        │
│                                                                                                  │
│    544 │   process.wait()                                                                        │
│    545 │   if process.returncode != 0:                                                           │
│    546 │   │   if not args.quiet:                                                                │
│ ❱  547 │   │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)   │
│    548 │   │   else:                                                                             │
│    549 │   │   │   sys.exit(1)                                                                   │
│    550                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py',
'--pretrained_model_name_or_path=/home/{USER}/kml/models1',
'--instance_data_dir=/home/{USER}/kml/datasets/objects/alhaitham',
'--class_data_dir=/home/{USER}/kml/datasets/objects/alhaitham1', '--output_dir=/home/{USER}/kml/models2',
'--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=character portrait of alhaitham',
'--class_prompt=1boy, medium hair, grey hair, green eyes, bishounen, colorful, autumn, green leaves, detailed fantasy
clothes, lighting, blue sky', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1',
'--learning_rate=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200',
'--max_train_steps=800', '--mixed_precision', 'no', '--train_text_encoder']' returned non-zero exit status 1.
```


### System Info

Ubuntu 20.04.5 LTS, diffusers installed yesterday from b6d4702301365fc2a3dce6b4739ec534e09ce36f, Python 3.8.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dreambooth] Precision mismatch when generating class images #1831

Describe the bug

Reproduction

Logs

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Dreambooth] Precision mismatch when generating class images #1831

Description

Describe the bug

Reproduction

Logs

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions