Code stuck on "initalizing ddp" when using more than one gpu 

## 🐛 Bug

I am trying to run a pytorch lightning model on a 4-GPU node. In my trainer, if I specify 

```
pl.Trainer(gpus=[0])
```
It runs fine. However, once I add another GPU
```
pl.Trainer(gpus=[0,1,2,3])
```
I get this output:


GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/4
initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/4
initializing ddp: GLOBAL_RANK: 2, MEMBER: 3/4
initializing ddp: GLOBAL_RANK: 3, MEMBER: 4/4

And the model just hangs there forever. I have tried this with only 2 GPUs and get the same behavior. 

Any idea why this may happen? I have tried with both ddp and ddp_spawn.

 - PyTorch Version-- tried both 1.4 and 1.7
 - OS-- Linux
 - Installed with pip
 - Python version: 3.8.5
 - CUDA/cuDNN version: 10.1
 - GPU models and configuration: NVIDIA K80s



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Code stuck on "initalizing ddp" when using more than one gpu #4612

🐛 Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Code stuck on "initalizing ddp" when using more than one gpu #4612

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions