Code freezes before validation sanity check when using DDP

## 🐛 Bug

Grretings from Italy!
I recently moved to PyTorch and a friend of mine introduced me to PL.
I'm coding an autoencoder (whose architecture is still pretty simple) using a custom loss function
which works on the hidden layer output. The link below leads to the github repo: 

https://github.com/notprime/custom_autoencoder/blob/main/autoenc_torch.ipynb

I read the documentation about the Multi-GPU Training, so I used '**ddp**' as accelerator, 
and used ` gpus = -1`  to select all the gpus. 
However, when I launch the script, the code freezes there:

`GPU available: True, used: True`
`TPU available: False, using: 0 TPU cores`
`Using native 16bit precision.`
`LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]`

I tried to wait 10-15 minutes, but nothing happened.
Instead, if I use '**dp**' as accelerator, everything works fine, and the script doesn't freeze.
The documentation says that **ddp** is preferred over **dp** because it's faster:
is there something I did wrong? I really don't know why the code stucks if I use **ddp** !

Thanks in advance!

 - PyTorch Version: 1.8.1
 - OS: Ubuntu 18.04
 - How you installed PyTorch: 'conda'
 - Python version: 3.8
 - CUDA/cuDNN version: 11.2
 - GPU models and configuration: 4 x TITAN Xp 12GB


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Code freezes before validation sanity check when using DDP #7336

🐛 Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Code freezes before validation sanity check when using DDP #7336

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions