You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Inverse Mel Scale Implementation
* Inverse Mel Scale Docs
* Better working version.
* GPU fix
* These shouldn't go on git..
* Even better one, but does not support JITability.
* Remove JITability test
* Flake8
* n_stft is a must
* minor clean up of initialization
* Add librosa consistency test
This PR follows up #366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility.
For `MelScale` compatibility test;
1. Generate spectrogram
2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance
3. Feed the spectrogram to `librosa.feature.melspectrogram` function.
4. Compare the result from 2 and 3 elementwise.
Element-wise numerical comparison is possible because under the hood their implementations use the same algorith.
For `InverseMelScale` compatibility test, it is more elaborated than that.
1. Generate the original spectrogram
2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance
3. Reconstruct spectrogram using torchaudio implementation
3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram.
3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1.
4. Reconstruct spectrogram using librosa
4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram.
4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.)
5. Check that resulting P1 distance are in a roughly same value range.
Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude.
Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen
```
print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1))
print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1))
>>> p1 dist (orig <-> ta): tensor(1482.1917)
>>> p1 dist (orig <-> lr): tensor(1420.7103)
```
This value can vary based on the length and the kind of the signal being processed, so it was handpicked.
* Address review feedbacks
* Support arbitrary batch dimensions.
* Add batch test
* Use view for batch
* fix sgd
* Use negative indices and update docstring
* Update threshold
Co-authored-by: Charles J.Y. Yoon <[email protected]>
0 commit comments