-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🐛 Bug
Grretings from Italy!
I recently moved to PyTorch and a friend of mine introduced me to PL.
I'm coding an autoencoder (whose architecture is still pretty simple) using a custom loss function
which works on the hidden layer output. The link below leads to the github repo:
https://github.com/notprime/custom_autoencoder/blob/main/autoenc_torch.ipynb
I read the documentation about the Multi-GPU Training, so I used 'ddp' as accelerator,
and used gpus = -1 to select all the gpus.
However, when I launch the script, the code freezes there:
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
Using native 16bit precision.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
I tried to wait 10-15 minutes, but nothing happened.
Instead, if I use 'dp' as accelerator, everything works fine, and the script doesn't freeze.
The documentation says that ddp is preferred over dp because it's faster:
is there something I did wrong? I really don't know why the code stucks if I use ddp !
Thanks in advance!
- PyTorch Version: 1.8.1
- OS: Ubuntu 18.04
- How you installed PyTorch: 'conda'
- Python version: 3.8
- CUDA/cuDNN version: 11.2
- GPU models and configuration: 4 x TITAN Xp 12GB