Skip to content

Conversation

@AzizCode92
Copy link
Contributor

This PR addresses the issue #1183

@AzizCode92 AzizCode92 marked this pull request as draft January 21, 2021 22:30
@AzizCode92
Copy link
Contributor Author

AzizCode92 commented Jan 21, 2021

Hi @mthrok, I have couple of questions:

  1. where should I include the testing of this code? inside test/torchaudio_unittest/functional/functional_impl.py ?

For the test, I have done something like this

class CodecTestBase:
    @parameterized.expand(list(itertools.product(
        ["mp3", "wav", "flac"],
        [96, 128, 160, 192, 224, 256, 320],
    )), name_func=name_func)
    def test_codec(self, format, compression):
        torch.random.manual_seed(42)
        waveform = torch.rand(2, 44100 * 1)
        augmented_waveform = F.apply_codec(waveform, format, channels_first=True, compression=compression)
        # TODO: maybe check the channels (number of frames can change depending on format like mp3)
  1. Is there a way I can verify the channels given the output waveform??
  2. should we also include the other parameters like channels_first and compression to the parameterized decorator?

@mthrok
Copy link
Contributor

mthrok commented Jan 22, 2021

Hi @mthrok, I have couple of questions:

  1. where should I include the testing of this code? inside test/torchaudio_unittest/functional/functional_impl.py ?

For the test, I have done something like this

class CodecTestBase:
    @parameterized.expand(list(itertools.product(
        ["mp3", "wav", "flac"],
        [96, 128, 160, 192, 224, 256, 320],
    )), name_func=name_func)
    def test_codec(self, format, compression):
        torch.random.manual_seed(42)
        waveform = torch.rand(2, 44100 * 1)
        augmented_waveform = F.apply_codec(waveform, format, channels_first=True, compression=compression)
        # TODO: maybe check the channels (number of frames can change depending on format like mp3)
  1. Is there a way I can verify the channels given the output waveform??
  2. should we also include the other parameters like channels_first and compression to the parameterized decorator?

Hi @AzizCode92

Thanks for working on this. The implementation looks good so far.
To answer your question,

  1. the test should be placed in test/torchaudio_unittest/functional/functional_cpu_test.py because it's CPU only feature.
  2. For the parameters, it's nice to have them covered, but we do not need to cover all the combination of formats and parameters. I suggest to add anther method that does the same thing as what you are describing above for format="wav" and channels_first=False.

By the way, looking at the docstring you added, I am wondering if we should make this feature require sox_io backend (means not supported on Windows) or allow the use of soundfile backend. With soundfile backend, the number of formats supported are very limited and the compression parameter has no effect. What do you think?

@AzizCode92
Copy link
Contributor Author

AzizCode92 commented Jan 22, 2021

Hi @mthrok, Thank you for your feedback.

By the way, looking at the docstring you added, I am wondering if we should make this feature require sox_io backend (means not supported on Windows) or allow the use of soundfile backend. With soundfile backend, the number of formats supported are very limited and the compression parameter has no effect. What do you think?

Hmm, do the two parameters compression and format have an effect on the data-augmentation we are trying to make? if yes, I guess we should make it require the sox_io backend. Otherwise, we would limit the parameters we can finetune to do the data-augmentation. Or we can include both sox_io and soundfile backends?

@AzizCode92
Copy link
Contributor Author

AzizCode92 commented Jan 23, 2021

Now I am working on the unit-test. So for the smoke test, my idea is to save the augmented data, load it again and assert the sample_rate, the expected and the found data. Is it the proper way to do it??
Also inside the smoke test, what format should I use? what save/load methods should I use?

thanks

@mthrok
Copy link
Contributor

mthrok commented Jan 25, 2021

Hi @AzizCode92

Hmm, do the two parameters compression and format have an effect on the data-augmentation we are trying to make? if yes, I guess we should make it require the sox_io backend. Otherwise, we would limit the parameters we can finetune to do the data-augmentation. Or we can include both sox_io and soundfile backends?

compression and format are the key ingredients for this augmentation. For example, when you want to train an ASR model for telephone conversation (say, for a call center application) and you do not have data collected from the particular environment you want to deploy the application, you need to train your initial model with something. If you have a clean dataset (like collected conversation from television) you can apply codecs (also background noise and reverberation) to make them sound like the conversation is happening over the phone.

Now I think that we can restrict the function to "sox_io" backend only although the code itself is compatible with "soundfile" backend, the lack of support for compression parameter make it not much useful for "soundfile" backend. You can use this decorator to achieve that.

Now I am working on the unit-test. So for the smoke test, my idea is to save the augmented data, load it again and assert the sample_rate, the expected and the found data. Is it the proper way to do it??
Also inside the smoke test, what format should I use? what save/load methods should I use?

My view is that since load and save functions that are used inside of the implementation are separately and rigorously tested, and the new function does not do much more than save and load, we can just check that the function does not crush. (and some very cheap check like sample_rate as you suggested). Although the implementation detail around the use of save and load functions can change anytime, so it's not the best approach but I think for now that's enough for us to getting started.

@mthrok
Copy link
Contributor

mthrok commented Jan 25, 2021

BTW, regarding the sample rate, I realized that some codecs require specific sample rate. (like AMR NB requires 8kHz) So we will need to incorporate this into the design later. (which will be a separate work than this PR)

For example, if I convert a WAV file into AMR NB with sox command, I get the following;

sox -V3 foo.wav foo.amr-nb

...

sox WARN formats: amr-nb can't encode at 32000Hz; using 8000Hz

...

sox INFO sox: effects chain: input        32000Hz  1 channels
sox INFO sox: effects chain: rate          8000Hz  1 channels
sox INFO sox: effects chain: dither        8000Hz  1 channels
sox INFO sox: effects chain: output        8000Hz  1 channels

@AzizCode92 AzizCode92 closed this Jan 26, 2021
@AzizCode92 AzizCode92 deleted the codec_augmentation branch January 26, 2021 21:16
@AzizCode92 AzizCode92 restored the codec_augmentation branch January 26, 2021 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants