torchaudio-contrib: some augmentations #285

ksanjeevan · 2019-09-17T17:48:54Z

Adding augmentations: TimeStretch, TimeMasking and FrequencyMasking.

On complex STFT

We had discussions about this in torchaudio-contrib, about how it's useful to have a transform that outputs the complex stft. So basically the same as the current Spectrogram but without normalizing the output. In our previous PR we added the complex_norm to the functionals so then the Spectrogram transform could just call the complex stft functional (which will basically just wrap torch.stft plus whatever padding we're currently doing, batching (?)) and then pass the output to complex_norm functional.
An example where this can be needed is for example with the introduction of the TimeStretch layer. It takes as input a complex spectrogram so if a user wanted to make a module where a spectrogram is computed and stretched it could look something like:

model = nn.Sequential( STFT(n_fft=n_fft, hop_length=hop_length),
                           TimeStretch(freq=num_freqs, hop_length=hop_length, fixed_rate=1.3),
                           ComplexNorm(power=2.0),
                           AmplitudeToDB() )

I have added a STFT transform as well as a stft functional to be able to showcase the TimeStretch augmentation (we have something like this in torchaudio-contrib), but of course this should be reworked once we get some feedback!

On batching

I've made STFT be able to handle a batch dimension, and so can the augmentation layers. I can't seem to find it but I remember a discussion on how the layers should handle batch of inputs. Should the layers apply transforms only to single inputs for now? I can make changes if that is the case.

On the augmentations

So we talked about what augmentations to do in #259 about doing TimeStretch, TimeMasking, FrequencyMasking and PitchShift but I've only included the first three in this PR since pitch shifting I think will require a bit more discussion and this can be a solid first step. I haven't included the tests for them yet since I figured many things might change after feedback is given. A quick summary of what they do:

Time Stretching

Simply wrapping and allowing for batching for the phase_vocoder functional introduced in the last PR. A fixed rate can be passed when initializing, or a (randomly generated if doing augmentation) rate can be given to the forward method each time. Looks like:

Time and Frequency masking

From SpecAugment, apply masks of a desired value to an input spectrogram. I've included a flag in case the input is batched and the user wants the masks to be independent of each other. Examples:
Same mask for batch:

Independent mask for batch:

Some demo code:

num_freqs, hop_length = 400, 512
model = nn.Sequential( STFT(n_fft=(num_freqs - 1) * 2, hop_length=hop_length),
                           TimeStretch(freq=num_freqs, hop_length=hop_length, fixed_rate=1.3),
                           ComplexNorm(power=2.0),
                           FrequencyMasking(freq_mask_param=60, iid_masks=False),
                           TimeMasking(time_mask_param=30, iid_masks=False),
                           AmplitudeToDB() )
inp = torch.as_tensor(librosa.load('file.wav')[0]).view(1, -1)
out = model(inp)
plot_heatmap(out)

gives:

cpuhrsch · 2019-09-17T18:58:37Z

torchaudio/functional.py

+
+
+@torch.jit.script
+def stft(waveform, pad, window, n_fft, hop_length, win_length):


stft is already in pytorch core. I know that not having batching is annoying but replicating a function across both libraries is too. We're working on a more principled abstraction around batching, but we'd like to avoid having a reimplementation of this. Maybe we can do this in a separate PR?

We're working on a more principled abstraction around batching, but we'd like to avoid having a reimplementation of this. Maybe we can do this in a separate PR?

👍

Gotcha, so I can take this out no problem and then have the STFT layer just use torch.stft? I think the bigger point for this was to have the non-normalized output of stft as a functional/layer.

Are you saying there's an operation provided by torchaudio that normalizes the output when you would like it not to?

Yeah, so Spectrogram calls the spectrogram functional which is basically computing the stft and then getting the power of the complex tensor (saying "normalizing" might have confusing on my part, sorry).

But in order to use the phase vocoder, the TimeStretch transform has to be applied to the output of the stft before the complex norm (i.e. before doingspec_f = spec_f.pow(power).sum(-1)).

So that's why I was kinda wanting torchaudio to have a transform that gives the complex STFT like we have in contrib, because if someone then wants to work with randomly stretched spectrograms, they can simply do:

nn.Sequential(STFT, TimeStretch, ComplexNorm, ...)

whereas currently STFT and ComplexNorm are "coupled" together as Spectrogram, and we can't put the time stretching in between (i.e. nn.Sequential(Spectrogram, TimeStretch, ...) won't work).

vincentqb · 2019-09-17T19:04:50Z

I've made STFT be able to handle a batch dimension, and so can the augmentation layers. I can't seem to find it but I remember a discussion on how the layers should handle batch of inputs. Should the layers apply transforms only to single inputs for now? I can make changes if that is the case.

In this PR, we can focus on single inputs in order to avoid adding STFT here again, as mentioned by @cpuhrsch. For batching, I would prefer waiting on a standardized approach. If that is so important, I'd open a separate PR for that so we don't block the rest :)

vincentqb · 2019-09-17T19:12:51Z

We had discussions about this in torchaudio-contrib, about how it's useful to have a transform that outputs the complex stft. So basically the same as the current Spectrogram but without normalizing the output. In our previous PR we added the complex_norm to the functionals so then the Spectrogram transform could just call the complex stft functional (which will basically just wrap torch.stft plus whatever padding we're currently doing, batching (?)) and then pass the output to complex_norm functional.

Quick thought: I'm not necessarily suggesting to change something, but maybe Spectrogram should have been two operations then :)

vincentqb

Thanks for working on this! Eventually, it might be nice to add the snippet as a quick test to ensure all work well together in the future.

ksanjeevan · 2019-09-17T19:16:21Z

Quick thought: I'm not necessarily suggesting to change something, but maybe Spectrogram should have been two operations then :)

Yeah that's actually how we have it in -contrib. The layers are STFT, ApplyFilterbank, TimeStretch, etc. So then if say you want a db melspectrogram, simply chain nn.Sequential(STFT, ComplexNorm, ApplyFilterbank, AmplitudeToDb). So we don't have an explicit Spectrogram layer, although I guess having both options can be ok?

vincentqb · 2019-09-17T19:35:57Z

Yeah that's actually how we have it in -contrib. The layers are STFT, ApplyFilterbank, TimeStretch, etc. So then if say you want a db melspectrogram, simply chain nn.Sequential(STFT, ComplexNorm, ApplyFilterbank, AmplitudeToDb). So we don't have an explicit Spectrogram layer, although I guess having both options can be ok?

I'd say we write the implementations here as if Spectrogram does not exist. We can then decide to axe it, keep it, or turn it into a wrapper, keeping in mind backward compatibility.

ksanjeevan · 2019-09-18T22:13:56Z

So I've removed the functional and transform complex-and-also-batched STFT, per @cpuhrsch and @vincentqb comments. Also added some tests for the masking functionals. Are the checks I wrote enough? I can add comments and/or more checks if needed. Should the next steps be writing some preliminary tests for the augmentation layers? I can also put a gist together showing how to chain these transforms.

vincentqb

Also added some tests for the masking functionals. Are the checks I wrote enough? I can add comments and/or more checks if needed. Should the next steps be writing some preliminary tests for the augmentation layers?

Is there something in SpecAugment we could compare against? Otherwise, the tests and the demo you provided here seems good to me.

I can also put a gist together showing how to chain these transforms.

If you have a demo of some code that could be fun to show, it could be shown as a new torchaudio tutorial, e.g. here, or example. :) These are useful since they also provide an integration test. This can be done as a separate PR.

test/test_functional.py

vincentqb

Thanks for working on this. LGTM!

ksanjeevan · 2019-09-19T22:05:17Z

Is there something in SpecAugment we could compare against? Otherwise, the tests and the demo you provided here seems good to me.

Hmm I'll look into it! I think the check we have to make sure the masked # of columns is < the mask parameter is a good start but there may be other things to compare with.

If you have a demo of some code that could be fun to show, it could be shown as a new torchaudio tutorial, e.g. here, or example. :) These are useful since they also provide an integration test. This can be done as a separate PR.

Yeah absolutely. Happy to do a separate PR with a demo/tutorial once we've sorted out the STFT stuff (just so it's cleaner for the user).

vincentqb · 2019-09-20T14:29:54Z

PS: let's not worry about this in this PR, but we try to make the transforms be thin wrappers around functionals as much as possible. :)

This reverts commit 008791c.

* STFT transform and function from #285 * merge options in existing functionality. * remove dimension 2 check. add test. * using ... * update spectrogram test.

ksanjeevan added 5 commits September 16, 2019 15:28

TimeStretch and Masking

8db50c7

Some refactoring

a47dc28

Fixed masking value for iid

0a5b106

Doc stuff and naming

d578225

Typos

4da2771

cpuhrsch requested a review from vincentqb September 17, 2019 18:57

cpuhrsch reviewed Sep 17, 2019

View reviewed changes

vincentqb reviewed Sep 17, 2019

View reviewed changes

ksanjeevan added 2 commits September 18, 2019 14:23

+ mask functional tests, - complex stft stuff

45fb34d

Merge

332d9e9

vincentqb added 3 commits September 19, 2019 16:51

typo

07c6259

typos

d7b0a18

Merge branch 'master' into contrib-aug1

05ba543

vincentqb reviewed Sep 19, 2019

View reviewed changes

test/test_functional.py Outdated Show resolved Hide resolved

Typo

6c72e8c

vincentqb approved these changes Sep 19, 2019

View reviewed changes

vincentqb self-assigned this Sep 19, 2019

vincentqb merged commit 5c0773f into pytorch:master Sep 20, 2019

vincentqb mentioned this pull request Nov 1, 2019

Complex STFT transform from spectrogram #327

Merged

vincentqb added a commit to vincentqb/audio that referenced this pull request Nov 6, 2019

STFT transform and function from pytorch#285

008791c

vincentqb added a commit to vincentqb/audio that referenced this pull request Nov 6, 2019

Revert "STFT transform and function from pytorch#285"

ed4033c

This reverts commit 008791c.

vincentqb added a commit to vincentqb/audio that referenced this pull request Nov 18, 2019

STFT transform and function from pytorch#285

0a72dca

vincentqb added a commit to vincentqb/audio that referenced this pull request Nov 18, 2019

Revert "STFT transform and function from pytorch#285"

3156ac5

This reverts commit 008791c.

vincentqb added a commit to vincentqb/audio that referenced this pull request Nov 18, 2019

STFT transform and function from pytorch#285

1f87602

vincentqb added a commit to vincentqb/audio that referenced this pull request Nov 18, 2019

Revert "STFT transform and function from pytorch#285"

18e55d0

This reverts commit 008791c.

vincentqb added a commit that referenced this pull request Nov 18, 2019

Complex STFT transform from spectrogram (#327)

1500d4e

* STFT transform and function from #285 * merge options in existing functionality. * remove dimension 2 check. add test. * using ... * update spectrogram test.

vincentqb mentioned this pull request Nov 21, 2019

Move augmentations in transforms #348

Merged

keunwoochoi mentioned this pull request Dec 7, 2019

Archive the repo keunwoochoi/torchaudio-contrib#71

Open

vincentqb mentioned this pull request Dec 20, 2019

Update audio preprocessing tutorial pytorch/tutorials#797

Merged

10 tasks

vincentqb mentioned this pull request Jan 10, 2020

standardizing freq/time axis #401

Closed

vincentqb mentioned this pull request Oct 21, 2020

add image for specaugment docstring #980

Closed



		@torch.jit.script
		def stft(waveform, pad, window, n_fft, hop_length, win_length):

torchaudio-contrib: some augmentations #285

torchaudio-contrib: some augmentations #285

Uh oh!

Conversation

ksanjeevan commented Sep 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

On complex STFT

On batching

On the augmentations

Time Stretching

Time and Frequency masking

Uh oh!

cpuhrsch Sep 17, 2019

Choose a reason for hiding this comment

Uh oh!

ksanjeevan Sep 17, 2019

Choose a reason for hiding this comment

Uh oh!

vincentqb Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

ksanjeevan Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

vincentqb commented Sep 17, 2019

Uh oh!

vincentqb commented Sep 17, 2019

Uh oh!

vincentqb left a comment

Choose a reason for hiding this comment

Uh oh!

ksanjeevan commented Sep 17, 2019

Uh oh!

vincentqb commented Sep 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ksanjeevan commented Sep 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vincentqb left a comment

Choose a reason for hiding this comment

Uh oh!

ksanjeevan commented Sep 19, 2019

Uh oh!

vincentqb commented Sep 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ksanjeevan commented Sep 17, 2019 •

edited

Loading

vincentqb commented Sep 17, 2019 •

edited

Loading

ksanjeevan commented Sep 18, 2019 •

edited

Loading

vincentqb left a comment •

edited

Loading