Inconsistent default and TS-incompatible lazy behavior in MelScale

1. All the spectrogram related transforms have default frequency bin `n_fft: int = 400`, while `MelScale` and `InverseMelScale` is `n_stft: Optional[int] = None`.

2. In `MelScale`, when `n_stft=None`, it tries to resize the buffer in `forward`, but this causes TorchScripted (and loaded from file) version fail.

https://github.com/pytorch/audio/blob/931555c1b4ebddf74bb5d439ea197dc1d1691a05/torchaudio/transforms.py#L287-L308

https://app.circleci.com/pipelines/github/pytorch/audio/5716/workflows/fe399658-a33f-47f2-8227-7750b2f0af2f/jobs/197223/tests#failed-test-0

```
>       return callable(*args, **kwargs)
E       RuntimeError: The following operation failed in the TorchScript interpreter.
E       Traceback of TorchScript, serialized code (most recent call last):
E         File "code/__torch__/torchaudio/transforms.py", line 20, in forward
E           if torch.eq(torch.numel(self.fb), 0):
E             tmp_fb = _0(torch.size(specgram0, 1), 0., 8000., 128, 16000, self.norm, self.mel_scale, )
E             _1 = torch.resize_(self.fb, torch.size(tmp_fb), memory_format=None)
E                  ~~~~~~~~~~~~~ <--- HERE
E             _2 = torch.copy_(self.fb, tmp_fb, False)
E           else:
E       
E       Traceback of TorchScript, original code (most recent call last):
E         File "/root/project/env/lib/python3.9/site-packages/torchaudio-0.9.0a0+bb886e7-py3.9-linux-x86_64.egg/torchaudio/transforms.py", line 302, in forward
E                                               self.mel_scale)
E                   # Attributes cannot be reassigned outside __init__ so workaround
E                   self.fb.resize_(tmp_fb.size())
E                   ~~~~~~~~~~~~~~~ <--- HERE
E                   self.fb.copy_(tmp_fb)
E           
E       RuntimeError: Trying to resize storage that is not resizable at /opt/conda/conda-bld/pytorch_1617951974812/work/aten/src/TH/THStorageFunctions.cpp:87
```

To reproduce,
1. construct `MelScale` with `n_stft=None`.
2. Script the transform and save on file
3. Load the transform from file and feed a spectrogram Tensor.

Once the transform is scripted and dumped, there is no way to fix this issue.
The library code should not be hacking around, which can generate such a stack state.

For fix, since all the `n_fft` defaults to `400`, `n_stft` should default to `201` as well.
This will remove the need of the above `resize_` hack.

	specgram (Tensor): A spectrogram STFT of dimension (..., freq, time).

	Returns:
	Tensor: Mel frequency spectrogram of size (..., ``n_mels``, time).
	"""

	# pack batch
	shape = specgram.size()
	specgram = specgram.reshape(-1, shape[-2], shape[-1])

	if self.fb.numel() == 0:
	tmp_fb = F.create_fb_matrix(specgram.size(1), self.f_min, self.f_max,
	self.n_mels, self.sample_rate, self.norm,
	self.mel_scale)
	# Attributes cannot be reassigned outside __init__ so workaround
	self.fb.resize_(tmp_fb.size())
	self.fb.copy_(tmp_fb)

	# (channel, frequency, time).transpose(...) dot (frequency, n_mels)
	# -> (channel, time, n_mels).transpose(...)
	mel_specgram = torch.matmul(specgram.transpose(1, 2), self.fb).transpose(1, 2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent default and TS-incompatible lazy behavior in MelScale #1454

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent default and TS-incompatible lazy behavior in MelScale #1454

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions