Skip to content

manual_optimization does not work with ddp #4953

@rakhimovv

Description

@rakhimovv

🐛 Bug

Can't run ddp with manual optimization. Fails on the second batch with a error:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forwardfunction. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiplecheckpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.

To Reproduce

Change optimization to manual in basic gan bolt.

Expected behavior

Do not fail when n_gpus > 1

Environment

  • CUDA:
    • GPU:
      • Tesla V100-SXM2-16GB
      • Tesla V100-SXM2-16GB
      • Tesla V100-SXM2-16GB
      • Tesla V100-SXM2-16GB
    • available: True
    • version: 10.2
  • Packages:
    • numpy: 1.19.4
    • pyTorch_debug: True
    • pyTorch_version: 1.7.0
    • pytorch-lightning: 1.0.8
    • tqdm: 4.54.0
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.9
    • version: Proposal for help #1 SMP Tue Sep 10 10:50:19 EDT 2019

Additional context

To have manual optimization working with GANs in multi-gpu regime is very useful applicaiton.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedOpen to be worked onpriority: 0High priority task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions