Very slow speed in complex_norm() function.

## 🐛 Bug

When I call `spectrogram()` in torchaudio 0.5.0 my training script runs very slow. I downgraded to pytorch 1.4 and torchaudio 0.4.0 then the script runs fast. So I compared the difference betweeen 0.4.0 and 0.5.0 and found that the complex_norm() is the root cause of the slow speed: 

https://github.com/pytorch/audio/blob/bc1df4882b967d55d8544e572ef16abeee9f45b5/torchaudio/functional.py#L171

If I install torchaudio 0.5.0 and change the line 171 to the 0.4.0's one: 

```python
- spec_f = complex_norm(spec_f, power=power)
+ spec_f = spec_f.pow(power).sum(-1)  # get power of "complex" tensor
```

and the speed is same as 0.4.0's.

For example in my training script with torachaudio 0.5.0:
```
--------- TRAINING - Epoch:     1/ 1000 ------------
| Batch:  2461/ 2461, 100.00%                      |
| Loss:   3.253154@lr=1.000000e-03                 |
| Speed:   69.64 files/sec, Elapsed Time: 00:58:53 |
----------------------------------------------------
```

The speed is about 70 files/sec. But with the 0.4.0's code:
```
--------- TRAINING - Epoch:     1/ 1000 ------------
| Batch:  2461/ 2461, 100.00%                      |
| Loss:   3.275024@lr=1.000000e-03                 |
| Speed:  408.14 files/sec, Elapsed Time: 00:10:02 |
----------------------------------------------------
```

As you can see the speed of 0.4.0's is about 5.7x faster than 0.5.0's. I can not share the script because the code is from my company. But I think that the speed difference comes from the process that call's the complex_norm() function. For example [voxceleb_trainer](https://github.com/clovaai/voxceleb_trainer) calls `spectrogram()` in main process and there's no speed reduction while my script calls spectrogram() in a forked process which is forked by pytorch's DataLoader.



## To Reproduce

Steps to reproduce the behavior:

1. Install torchaudio 0.5.0
1. run the script.
1. get slow speed
1. change the `spec_f = complex_norm(spec_f, power=power)` to `spec_f = spec_f.pow(power).sum(-1)`
1. run the script again.
1. get normal speed



## Expected behavior

- the speed of my script becomes normal when I use either torchaudio 0.4.0 or 0.5.0.




## Environment

 - What commands did you used to install torchaudio (conda/pip/build from source)?
    - pip
 - If you are building from source, which commit is it?
 - What does `torchaudio.__version__` print? (If applicable)
    - 0.5.0

Please copy and paste the output from our
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).

Collecting environment information...
PyTorch version: 1.5.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: Tesla P40
GPU 1: Tesla P40
GPU 2: Tesla P40
GPU 3: Tesla P40
GPU 4: Tesla P40
GPU 5: Tesla P40
GPU 6: Tesla P40
GPU 7: Tesla P40

Nvidia driver version: 418.87.00
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.18.4
[pip3] torch==1.5.0+cu101
[pip3] torchaudio==0.5.0
[pip3] torchvision==0.6.0+cu101
[conda] Could not collect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Very slow speed in complex_norm() function. #740

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Very slow speed in complex_norm() function. #740

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions