[WIP] apply codec-based data augmentation #1194

AzizCode92 · 2021-01-21T22:29:56Z

This PR addresses the issue #1183

AzizCode92 · 2021-01-21T22:43:48Z

Hi @mthrok, I have couple of questions:

where should I include the testing of this code? inside test/torchaudio_unittest/functional/functional_impl.py ?

For the test, I have done something like this

class CodecTestBase:
    @parameterized.expand(list(itertools.product(
        ["mp3", "wav", "flac"],
        [96, 128, 160, 192, 224, 256, 320],
    )), name_func=name_func)
    def test_codec(self, format, compression):
        torch.random.manual_seed(42)
        waveform = torch.rand(2, 44100 * 1)
        augmented_waveform = F.apply_codec(waveform, format, channels_first=True, compression=compression)
        # TODO: maybe check the channels (number of frames can change depending on format like mp3)

Is there a way I can verify the channels given the output waveform??
should we also include the other parameters like channels_first and compression to the parameterized decorator?

mthrok · 2021-01-22T16:38:17Z

Hi @mthrok, I have couple of questions:

where should I include the testing of this code? inside test/torchaudio_unittest/functional/functional_impl.py ?

For the test, I have done something like this
class CodecTestBase:
    @parameterized.expand(list(itertools.product(
        ["mp3", "wav", "flac"],
        [96, 128, 160, 192, 224, 256, 320],
    )), name_func=name_func)
    def test_codec(self, format, compression):
        torch.random.manual_seed(42)
        waveform = torch.rand(2, 44100 * 1)
        augmented_waveform = F.apply_codec(waveform, format, channels_first=True, compression=compression)
        # TODO: maybe check the channels (number of frames can change depending on format like mp3)
Is there a way I can verify the channels given the output waveform??

should we also include the other parameters like channels_first and compression to the parameterized decorator?

Hi @AzizCode92

Thanks for working on this. The implementation looks good so far.
To answer your question,

the test should be placed in test/torchaudio_unittest/functional/functional_cpu_test.py because it's CPU only feature.
For the parameters, it's nice to have them covered, but we do not need to cover all the combination of formats and parameters. I suggest to add anther method that does the same thing as what you are describing above for format="wav" and channels_first=False.

By the way, looking at the docstring you added, I am wondering if we should make this feature require sox_io backend (means not supported on Windows) or allow the use of soundfile backend. With soundfile backend, the number of formats supported are very limited and the compression parameter has no effect. What do you think?

AzizCode92 · 2021-01-22T22:42:20Z

Hi @mthrok, Thank you for your feedback.

By the way, looking at the docstring you added, I am wondering if we should make this feature require sox_io backend (means not supported on Windows) or allow the use of soundfile backend. With soundfile backend, the number of formats supported are very limited and the compression parameter has no effect. What do you think?

Hmm, do the two parameters compression and format have an effect on the data-augmentation we are trying to make? if yes, I guess we should make it require the sox_io backend. Otherwise, we would limit the parameters we can finetune to do the data-augmentation. Or we can include both sox_io and soundfile backends?

AzizCode92 · 2021-01-23T14:45:43Z

Now I am working on the unit-test. So for the smoke test, my idea is to save the augmented data, load it again and assert the sample_rate, the expected and the found data. Is it the proper way to do it??
Also inside the smoke test, what format should I use? what save/load methods should I use?

thanks

mthrok · 2021-01-25T17:37:49Z

Hi @AzizCode92

Hmm, do the two parameters compression and format have an effect on the data-augmentation we are trying to make? if yes, I guess we should make it require the sox_io backend. Otherwise, we would limit the parameters we can finetune to do the data-augmentation. Or we can include both sox_io and soundfile backends?

compression and format are the key ingredients for this augmentation. For example, when you want to train an ASR model for telephone conversation (say, for a call center application) and you do not have data collected from the particular environment you want to deploy the application, you need to train your initial model with something. If you have a clean dataset (like collected conversation from television) you can apply codecs (also background noise and reverberation) to make them sound like the conversation is happening over the phone.

Now I think that we can restrict the function to "sox_io" backend only although the code itself is compatible with "soundfile" backend, the lack of support for compression parameter make it not much useful for "soundfile" backend. You can use this decorator to achieve that.

Now I am working on the unit-test. So for the smoke test, my idea is to save the augmented data, load it again and assert the sample_rate, the expected and the found data. Is it the proper way to do it??
Also inside the smoke test, what format should I use? what save/load methods should I use?

My view is that since load and save functions that are used inside of the implementation are separately and rigorously tested, and the new function does not do much more than save and load, we can just check that the function does not crush. (and some very cheap check like sample_rate as you suggested). Although the implementation detail around the use of save and load functions can change anytime, so it's not the best approach but I think for now that's enough for us to getting started.

mthrok · 2021-01-25T17:44:49Z

BTW, regarding the sample rate, I realized that some codecs require specific sample rate. (like AMR NB requires 8kHz) So we will need to incorporate this into the design later. (which will be a separate work than this PR)

For example, if I convert a WAV file into AMR NB with sox command, I get the following;

sox -V3 foo.wav foo.amr-nb

...

sox WARN formats: amr-nb can't encode at 32000Hz; using 8000Hz

...

sox INFO sox: effects chain: input        32000Hz  1 channels
sox INFO sox: effects chain: rate          8000Hz  1 channels
sox INFO sox: effects chain: dither        8000Hz  1 channels
sox INFO sox: effects chain: output        8000Hz  1 channels

facebook-github-bot added the CLA Signed label Jan 21, 2021

AzizCode92 marked this pull request as draft January 21, 2021 22:30

AzizCode92 closed this Jan 26, 2021

AzizCode92 force-pushed the codec_augmentation branch from f7c6a2e to b33c539 Compare January 26, 2021 21:05

AzizCode92 deleted the codec_augmentation branch January 26, 2021 21:16

AzizCode92 restored the codec_augmentation branch January 26, 2021 21:16

AzizCode92 mentioned this pull request Jan 27, 2021

Apply codec-based data augmentation #1200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] apply codec-based data augmentation #1194

[WIP] apply codec-based data augmentation #1194

Uh oh!

AzizCode92 commented Jan 21, 2021

Uh oh!

AzizCode92 commented Jan 21, 2021 •

edited

Loading

Uh oh!

mthrok commented Jan 22, 2021

Uh oh!

AzizCode92 commented Jan 22, 2021 •

edited

Loading

Uh oh!

AzizCode92 commented Jan 23, 2021 •

edited

Loading

Uh oh!

mthrok commented Jan 25, 2021

Uh oh!

mthrok commented Jan 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] apply codec-based data augmentation #1194

[WIP] apply codec-based data augmentation #1194

Uh oh!

Conversation

AzizCode92 commented Jan 21, 2021

Uh oh!

AzizCode92 commented Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok commented Jan 22, 2021

Uh oh!

AzizCode92 commented Jan 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AzizCode92 commented Jan 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok commented Jan 25, 2021

Uh oh!

mthrok commented Jan 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AzizCode92 commented Jan 21, 2021 •

edited

Loading

AzizCode92 commented Jan 22, 2021 •

edited

Loading

AzizCode92 commented Jan 23, 2021 •

edited

Loading