-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
featureIs an improvement or enhancementIs an improvement or enhancementhelp wantedOpen to be worked onOpen to be worked onpriority: 0High priority taskHigh priority taskwon't fixThis will not be worked onThis will not be worked on
Description
🐛 Bug
In validation step, I get a metric computed on parameters tensor that have an attribute that is set when initializing the network. E.g. some parameters have a binary flag "ptype" attached like so : tensor.ptype = True.
It works well with Pytorch native DataParallel or with single GPU. However, when using Lightning DataParallel the issue is that the validation step is done on a child process for which only the network replica exist: however those replica do not preserve tensors arbitrary properties.
To Reproduce
Steps to reproduce the behavior:
- Attach some arbitrary properties to a neural network parameter, e.g. :
# in __init__ of the nn.Module
self.register_parameter('some_param', nn.Parameter(torch.tensor([0.1]))
some_param.ptype = True
- Run in dataparallel mode with >1 GPU
- In validation step, do a list comprehension like so :
[param.ptype for param in self.model.parameters() if hasattr(param, 'ptype')] - Notice the list is empty for the replica processes
Code sample
See above.
Expected behavior
The returned list shouldn't be empty. On another hand, since the metric is a constant across replica, the function needn't be run in parallel.
Environment
PyTorch version: 1.6.0.dev20200403
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: CentOS Linux release 7.8.2003 (Core)
GCC version: (Homebrew GCC 5.5.0_7) 5.5.0
CMake version: version 3.13.0
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: Tesla P100-SXM2-16GB
GPU 1: Tesla P100-SXM2-16GB
GPU 2: Tesla P100-SXM2-16GB
GPU 3: Tesla P100-SXM2-16GB
Nvidia driver version: 440.33.01
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.18.1
[conda] blas 1.0 mkl
[conda] kmeans-pytorch 0.2 pypi_0 pypi
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] pytorch 1.6.0.dev20200403 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch-nightly
[conda] pytorch-lightning 0.7.5 pypi_0 pypi
[conda] pytorch-memlab 0.0.4 pypi_0 pypi
[conda] pytorch-pcen 0.0.1 pypi_0 pypi
[conda] torchvision 0.6.0.dev20200403 py37_cu101 pytorch-nightly
Metadata
Metadata
Assignees
Labels
featureIs an improvement or enhancementIs an improvement or enhancementhelp wantedOpen to be worked onOpen to be worked onpriority: 0High priority taskHigh priority taskwon't fixThis will not be worked onThis will not be worked on