Skip to content

EMA model is not updated properly with multi-gpu configuration #1895

@SuwoongHeo

Description

@SuwoongHeo

Describe the bug

The training code for unconditional image generation at

examples/unconditional_image_generation/train_unconditional.py

is not working properly when it is trained via accelerate with multi-gpu training.

Specifically, the EMAModel is not updated while it works without accelerate. The below figure shows multi-gpu without EMA (left), multi-gpu with EMA (middle), and single-gpu with EMA (right) cases
Is there any workaround for this?

image

Reproduction

I had followed this tutorial : https://huggingface.co/docs/diffusers/training/unconditional_training

with

accelerate launch train_unconditional.py \
  --dataset_name="huggan/pokemon" \
  --resolution=64 \
  --output_dir="ddpm-ema-pokemon-64" \
  --train_batch_size=16 \
  --num_epochs=100 \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-4 \
  --lr_warmup_steps=500 \
  --mixed_precision=no \
  --push_to_hub

Logs

No response

System Info

My environment throws an error for diffuesrs-cli env like,

Traceback (most recent call last):
  File "/ssd2/swheo/home/anaconda3/envs/diffusers/bin/diffusers-cli", line 5, in <module>
    from diffusers.commands.diffusers_cli import main
  File "/ssd2/swheo/dev/code/diffusers/src/diffusers/__init__.py", line 53, in <module>
    from .pipelines import (
  File "/ssd2/swheo/dev/code/diffusers/src/diffusers/pipelines/__init__.py", line 57, in <module>
    from .unclip import UnCLIPImageVariationPipeline, UnCLIPPipeline
ImportError: cannot import name 'UnCLIPImageVariationPipeline' from 'diffusers.pipelines.unclip' (/ssd2/swheo/dev/code/diffusers/src/diffusers/pipelines/unclip/__init__.py)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstaleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions