Skip to content

Can't run torch.jit.ScriptModule based transforms like AmplitudeToDB with multi GPUs #432

@dhgrs

Description

@dhgrs

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. Run
import torch
import torchaudio

device = 'cuda'
to_db = torch.nn.DataParallel(torchaudio.transforms.AmplitudeToDB()).to(device)
x = torch.arange(1).float().to(device)
print(to_db(x))

Expected behavior

When I run the code with single GPU, I get

$ CUDA_VISIBLE_DEVICES=0 python3 debug.py
tensor([-100.], device='cuda:0')

and this is expected behavior but when I run with multi GPUs, I get like below.

$ CUDA_VISIBLE_DEVICES=0,1 python3 debug.py
Traceback (most recent call last):
  File "debug.py", line 7, in <module>
    print(to_db(x))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1678, in __getattr__
    return super(RecursiveScriptModule, self).__getattr__(attr)
  File "/opt/conda/lib/python3.7/site-packages/torch/jit/__init__.py", line 1499, in __getattr__
    return super(ScriptModule, self).__getattr__(attr)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'RecursiveScriptModule' object has no attribute 'forward'

Environment

Collecting environment information...
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0        
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
GPU 4: Tesla V100-SXM2-32GB
GPU 5: Tesla V100-SXM2-32GB
GPU 6: Tesla V100-SXM2-32GB
GPU 7: Tesla V100-SXM2-32GB

Nvidia driver version: 418.67
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip] numpy==1.17.4
[pip] pytorch-ignite==0.3.0
[pip] torch==1.4.0
[pip] torchaudio==0.4.0
[pip] torchvision==0.5.0
[conda] blas                      1.0                         mkl
[conda] mkl                       2019.4                      243
[conda] mkl-service               2.3.0            py37he904b0f_0
[conda] mkl_fft                   1.0.15           py37ha843d7b_0
[conda] mkl_random                1.1.0            py37hd6b4f25_0
[conda] pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchvision               0.5.0                py37_cu101    pytorch

Additional context

AmplitudeToDB is based on torch.jit.ScriptModule and maybe that's why the error occurs.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions