Merging plan from torchaudio-contrib

Hi all,
I think it's good timing to discuss a potential merging plan from [torchaudio-contrib](https://github.com/keunwoochoi/torchaudio-contrib) to here, especially because there's going to be new features and changes by @jamarshon @cpuhrsch. 


## Main idea
A lot of things are well summarized in https://github.com/keunwoochoi/torchaudio-contrib. In short, we wanted to re-design torch-based audio processing so that

- things can be `Layers`, which are based on corresponding `Functionals`
- names for layers and arguments are carefully chosen
- all work for multi-channel
- complex numbers are supported when it makes sense (e.g., STFTs)

## Review - layers
. torchaudio-contrib already covers lots of functions that `transform.py` is covering now, but not all of them. And that's why I feel like it's time to discuss this here.
Let me list the classes in `transform.py` one by one with some notes.

### 1. Already in torchaudio-contrib. Hoping we'd replace. 
- `class Spectrogram`: we have it in torchaudio-contrib. On top of this, we also have `STFT` layer which outputs complex representations (same as `torch.stft` since we're wrapping it).
- `class MelScale`: we have it and would like to suggest to change the name to something more general. We named it `class MelFilterbank`, assuming there can be other types of filterbanks, too. It also supports `htk` and non-`htk` mel filterbanks.
- `class SpectrogramToDB`: we would like to propose a more general approach -- `class AmplitudeToDb(ref=1.0, amin=1e-7)` and `class DbToAmplitude(ref=1.0)`, because decibel-scaling is about changing it's unit, not the core content of the input. 
- `class MelSpectrogram`: we have it, which returns a `nn.Sequential` model consists of Spectrogram and mel-scale filter bank.
- `class MuLawEncoding`, `class MuLawExpanding`: we have it, actually a 99% copy of the implementation here. 

### 2. Wouldn't need these
- `class Compose`: we wouldn't need it because once things are based on `Layers` people can simply build a `nn.Sequential()`. 
- `class Scale`: It does `16int` --> `float`. I think we need to deprecate this because if we really need this, it should be with a more intuitive and precise name, and probably should support other conversions as well. 

### 3. To-be-added
- `class DownmixMono`: I would like to have one. But we also consider having a [time-frequency representation-based downmix (energy-preserving operation)](https://github.com/keunwoochoi/torchaudio-contrib/pull/45) (@faroit). I'm open for discussion. Personally I'd prefer to have separate classes,`DownmixWaveform()` and `DownmixSpecgram()`. Maybe until we have a better one, we should keep it as it is.
- `class MFCC`: we currently don't have it. The current torch/audio implementation uses `s2db (SpectrogramToDB)`, but this class seems little arbitrary for me, so we might want to re-implement it. 

### 4. Not sure about these
- `class PadTrim`: I don't actually know why we need it exactly, would love to hear about this!
- `class LC2CL`: So far, torchaudio-contrib code hasn't considered `channel-first` tensors. If it's a thing, we'd i) update our code to make them compatible and ii) have the same or a similar class to this. But, ..do we really need this?
- `class BLC2CBL`: same as `LC2CL` -- I'd like to know its use cases.

## Review - argument and variable names

As summarised --> https://github.com/keunwoochoi/torchaudio-contrib/issues/46, we'd like to use
- `waveforms` for a batch of waveforms
- `real_specgrams` for magnitude spectrograms
- `complex_specgrams` for complex spectrograms
. (This is relatively less-discussed).

## Audio loading

@faroit has been working on [replacing Sox with others](https://github.com/keunwoochoi/torchaudio-contrib#a-big-issue---remove-sox-dependency). But here in this issue, I'd like to focus on the topics above.

## So,
- Any opinion on this?
- Any answers to the questions I have!
- If it looks good, what else would you like to have in the one-shot PR that would replace the current `transforms.py`? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merging plan from torchaudio-contrib #110

Main idea

Review - layers

1. Already in torchaudio-contrib. Hoping we'd replace.

2. Wouldn't need these

3. To-be-added

4. Not sure about these

Review - argument and variable names

Audio loading

So,

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Merging plan from torchaudio-contrib #110

Description

Main idea

Review - layers

1. Already in torchaudio-contrib. Hoping we'd replace.

2. Wouldn't need these

3. To-be-added

4. Not sure about these

Review - argument and variable names

Audio loading

So,

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions