-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🐛 Bug
Can't run ddp with manual optimization. Fails on the second batch with a error:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forwardfunction. Please make sure model parameters are not shared across multiple concurrent forward-backward passes2) Reused parameters in multiple reentrant backward passes. For example, if you use multiplecheckpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
To Reproduce
Change optimization to manual in basic gan bolt.
Expected behavior
Do not fail when n_gpus > 1
Environment
- CUDA:
- GPU:
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- Tesla V100-SXM2-16GB
- available: True
- version: 10.2
- GPU:
- Packages:
- numpy: 1.19.4
- pyTorch_debug: True
- pyTorch_version: 1.7.0
- pytorch-lightning: 1.0.8
- tqdm: 4.54.0
- System:
- OS: Linux
- architecture:
- 64bit
- processor: x86_64
- python: 3.7.9
- version: Proposal for help #1 SMP Tue Sep 10 10:50:19 EDT 2019
Additional context
To have manual optimization working with GANs in multi-gpu regime is very useful applicaiton.