Skip to content

DP - Getting tensor properties #1649

@sebastienwood

Description

@sebastienwood

🐛 Bug

In validation step, I get a metric computed on parameters tensor that have an attribute that is set when initializing the network. E.g. some parameters have a binary flag "ptype" attached like so : tensor.ptype = True.

It works well with Pytorch native DataParallel or with single GPU. However, when using Lightning DataParallel the issue is that the validation step is done on a child process for which only the network replica exist: however those replica do not preserve tensors arbitrary properties.

To Reproduce

Steps to reproduce the behavior:

  1. Attach some arbitrary properties to a neural network parameter, e.g. :
# in __init__ of the nn.Module
self.register_parameter('some_param', nn.Parameter(torch.tensor([0.1]))
some_param.ptype = True
  1. Run in dataparallel mode with >1 GPU
  2. In validation step, do a list comprehension like so : [param.ptype for param in self.model.parameters() if hasattr(param, 'ptype')]
  3. Notice the list is empty for the replica processes

Code sample

See above.

Expected behavior

The returned list shouldn't be empty. On another hand, since the metric is a constant across replica, the function needn't be run in parallel.

Environment

PyTorch version: 1.6.0.dev20200403
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: CentOS Linux release 7.8.2003 (Core)
GCC version: (Homebrew GCC 5.5.0_7) 5.5.0
CMake version: version 3.13.0

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration: 
GPU 0: Tesla P100-SXM2-16GB
GPU 1: Tesla P100-SXM2-16GB
GPU 2: Tesla P100-SXM2-16GB
GPU 3: Tesla P100-SXM2-16GB

Nvidia driver version: 440.33.01
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.18.1
[conda] blas                      1.0                         mkl  
[conda] kmeans-pytorch            0.2                      pypi_0    pypi
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] pytorch                   1.6.0.dev20200403 py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch-nightly
[conda] pytorch-lightning         0.7.5                    pypi_0    pypi
[conda] pytorch-memlab            0.0.4                    pypi_0    pypi
[conda] pytorch-pcen              0.0.1                    pypi_0    pypi
[conda] torchvision               0.6.0.dev20200403      py37_cu101    pytorch-nightly

Metadata

Metadata

Assignees

Labels

featureIs an improvement or enhancementhelp wantedOpen to be worked onpriority: 0High priority taskwon't fixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions