Skip to content

manual_optimization does not work with dp #4961

@carefree0910

Description

@carefree0910

🐛 Bug

We can run dp backend with manual optimization, but the gradients seem to be messed up hence the model can't learn anything.

To Reproduce

  • Change optimization to manual in basic gan bolt, then change the backend to dp.
  • Set batch_size = 2, compare experiments on 1 GPU vs 2 GPUs
  • When using 1 GPU everything is fine, but using 2 GPUs will fail the training.

I haven't really test it yet, but since I've done many experiments on my own implementations (which is too heavy to paste them here and hard to extract), I think it should be able to reproduce.

Expected behavior

Performance under 2 GPUs with dp backend should be identical to the 1 GPU one.

Environment

(Should be ) Any.

Additional context

This bug comes from my experiments on GANs but should be affecting other models as long as the manual optimization is utilized.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedOpen to be worked onpriority: 1Medium priority taskwaiting on authorWaiting on user action, correction, or update

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions