Apply codec-based data augmentation #1200

AzizCode92 · 2021-01-26T21:33:45Z

This PR addresses the issue #1183

This reverts commit 9165a4d.

AzizCode92 · 2021-01-27T12:22:44Z

Hi @mthrok, this the baseline code I have made.
I apologize beforehand for accidentally closing the previous PR.

mthrok

Hi @AzizCode92

Thanks for working on this. It is looking good.

mthrok · 2021-01-29T15:54:45Z

torchaudio/functional/functional.py

+    Applies codecs as a form of augmentation
+
+    Args:
+        waveform (Tensor): Tensor of audio of dimension (..., time)


The description of the dimension is is not accurate as channels_first switch them.
Something like Audio data. Must be 2 dimensional. See also ``channels_first`` would do.

mthrok · 2021-01-29T16:24:06Z

test/torchaudio_unittest/functional/functional_cpu_test.py

+from torchaudio._internal import (
+    module_utils as _mod_utils,
+)
+from torchaudio.backend import sox_io_backend


Is it your IDE that is sorting the import statements?
My interpretation of PEP8 here is that torchaudio_unittest is the local library so it should be the third group and torch, torchaudio and other libraries used for testing are the second group.
So I prefer torchaudio comes before torchaudio_unittest. But if this is something your IDE does automatically, I won't argue.

https://www.python.org/dev/peps/pep-0008/#imports

Imports should be grouped in the following order:

Standard library imports.

Related third party imports.

Local application/library specific imports.
You should put a blank line between each group of imports.

mthrok · 2021-01-29T16:26:27Z

test/torchaudio_unittest/functional/functional_cpu_test.py

+
+
+@skipIfNoExec('sox')
+class ApplyCodecTestBase(TempDirMixin, TorchaudioTestCase):


Can you rename this to TestApplyCodec? one thing, this is not a base class, the second, TestXXX is the common pattern used in torchaudio's test suite.

mthrok · 2021-01-29T16:41:08Z

test/torchaudio_unittest/functional/functional_cpu_test.py

+                                       compression=compression)
+        save_wav(path, augmented_data, sample_rate)
+        info = sox_io_backend.info(path)
+        assert info.sample_rate == sample_rate


Since the data is saved with the given sample_rate. This assertion always passes.
Since the apply_codec works in-memory manner, it is unnecessary to save data on file system. (TempDirMixin is not necessary)

Since it is not simple to check if the resulting data are correct, I suggest that you check the shape (number of channels and conditionally number of frames).

def _smoke_test(self, format, compression, *, check_num_frames): torch.random.manual_seed(42) sample_rate = 8000 num_frames = 3.0 * sample_rate num_channels = 2 waveform = torch.rand(num_channels, num_frames) augmented = F.apply_codec( waveform, sample_rate, format, channels_first=True, compression=compression) assert augmented.dtype == waveform.dtype assert augmented.shape[0] == num_channels if check_num_frames: assert augmented.shape[1] == num_frames

then for each formats,

def test_wave(self): self._smoke_test("wav", compression=None, check_num_frames=True) @parameterized.expand([96, ...]) def test_mp3(self, compression): self._smoke_test("mp3", compression, check_num_frames=False)

Note that WAVE is uncompressed format, so providing compression argument makes no difference.

Try to add other formats that are tested in save_test file, like "opus". They have different valid value range for compression argument.

I have added vorbis and flac. By adding opus i got RuntimeError: Unsupported file type: opus?

I also got an error inside the unit-test for the cases of mp3 files.
Something like this RuntimeError: Error loading audio file: failed to open file.

Oh sorry, I have forgotten that torchaudio can only read OPUS but cannot write.

mthrok · 2021-01-29T16:42:43Z

test/torchaudio_unittest/functional/functional_cpu_test.py

+        info = sox_io_backend.info(path)
+        assert info.sample_rate == sample_rate
+
+    @_mod_utils.requires_module('torchaudio._torchaudio')


You can use @skipIfNoExtension instead of @_mod_utils.requires_module, and you can put it on the class definition so that you do not need to do it on every method.

mthrok · 2021-01-29T16:43:39Z

test/torchaudio_unittest/functional/functional_cpu_test.py

        assert (num_masked_columns < mask_param).sum() == num_masked_columns.numel()
+
+
+@skipIfNoExec('sox')


sox command should not be required. This is a smoke test that does not require any external tool.

mthrok · 2021-01-29T16:46:35Z

torchaudio/functional/functional.py

    return (freqs * specgram).sum(dim=freq_dim) / specgram.sum(dim=freq_dim)
+
+
+def apply_codec(waveform, sample_rate, format, channels_first=True, compression=None) -> Tensor:


Can you reorder the arguments like waveform, sample_rate, compression=None, format, channels_first=True?

The expected usage will change the value for compression, while channels_first will not be changed (at all).

mthrok · 2021-01-29T16:47:44Z

torchaudio/functional/functional.py

+            See the detail at http://sox.sourceforge.net/soxformat.html.
+
+    Returns:
+        Tensor: Dimension (..., time)


Similar to the above (..., time) is not accurate. Can you also mention that the number of frames might change for certain codecs?

Sorry, I didn't get this comment correctly.
Since number of frames was not used inside this method, I didn't understand how to add it in the return statement inside the docstrings. Is it part of the shape of the output tensor? is our output tensor something like (..., num_frames)?

You can use the something similar to the docstring of I/O function. If channels_first=True, it has
[channel, time] else [time, channel].

mthrok · 2021-01-29T16:50:48Z

torchaudio/functional/functional.py

+        cmn_window: int = 600,
+        min_cmn_window: int = 100,
+        center: bool = False,
+        norm_vars: bool = False,


(In general, it is a common practice not to touch the part unrelated to the main goal of the PR. You might be asked to revert it.)

Yes, you're totally right.
I think my IDE did it when I make code reformat.

AzizCode92 · 2021-02-02T00:36:27Z

Hi @mthrok, thanks a lot for your detailed feedback.
The tests are for the moment failing for the case of mp3 files.
Is it a bug related to how we load mp3 files using the sox_io_backend or something that I should adjust from my side?

mthrok · 2021-02-03T05:40:50Z

torchaudio/functional/functional.py

    return (freqs * specgram).sum(dim=freq_dim) / specgram.sum(dim=freq_dim)
+
+
+def apply_codec(waveform, sample_rate, compression, format, channels_first=True) -> Tensor:


Can you annotate the signature so that it is TorchScript compatible?

Can you reorder compression and format? compression depends on format.

Can you make compression parameter optional so that it defaults to None (and update the docstring)?

Can you decorate the function with requires_module so that this will be only available to torchaudio installation with C++ extension, as we discussed?

@_mod_utils.requires_module('torchaudio._torchaudio') def apply_codec( waveform: Tensor, sample_rate: int, format: str, compression: Optional[float] = None, channels_first: bool = True, ) -> Tensor:

mthrok · 2021-02-03T05:41:57Z

torchaudio/functional/functional.py

+        Tensor
+    """
+    bytes = io.BytesIO()
+    torchaudio.save(bytes, waveform, sample_rate, channels_first, compression=compression, format=format)


I realized that if the backend is set to soundfile it will use soundfile.
Can you use torchaudio.backend.sox_io_backend.save instead?

mthrok · 2021-02-03T05:45:45Z

torchaudio/functional/functional.py

+    bytes = io.BytesIO()
+    torchaudio.save(bytes, waveform, sample_rate, channels_first, compression=compression, format=format)
+    bytes.seek(0)
+    waveform, _ = torchaudio.load(bytes, channels_first=channels_first)


I realized that there are certain formats that has limitation on sample rate.

Can you replace load with sox_effects.apply_effects_file, and provide [["rate", f"{sample_rate}"]] so that they are re-sampled to original sample rate if necessary?

augmented, _ = torchaudio.sox_effect.apply_effects_file( bytes, effects=[["rate", f"{sample_rate}"]], channels_first=channels_first, format=format) return augmented

mthrok · 2021-02-03T05:50:25Z

torchaudio/functional/functional.py

+        sample_rate (int): Sample rate of the audio waveform
+        format (str): file format
+        channels_first (bool):
+            When True, the returned Tensor has dimension ``[channel, time]``.


Actually, channels_first value is for both input and returned Tensor. It tells which format the input Tensor is, then return the resulting Tensor in the same manner.

mthrok · 2021-02-03T05:55:10Z

If you have time, adding tests for TorchScript consistency (like this one) would be nice.

mthrok · 2021-02-03T05:57:13Z

Can you incorporate the changes from the latest master? I think the test failures will be fixed by #1181.

AzizCode92 · 2021-02-11T21:29:00Z

Hi @mthrok.
The unit-test for apply_codec are green now.
I have added also a test for TorchScript consistency.
The PR is red due to a RuntimeError of the TorchScript consistency tests.

mthrok

Hi @AzizCode92

Thanks for the work. Please refer to the comments. Once the tests pass, I think we can merge this.

mthrok · 2021-02-12T04:25:04Z

test/torchaudio_unittest/functional/torchscript_consistency_impl.py

+        def func(tensor):
+            sample_rate = 8000,
+            format = "wav",
+            compression = None,


The trailing commas here make the TorchScript compiler think that they are tuple type, thus the test is failing. Removing the commas should make the test work.

mthrok · 2021-02-12T04:31:02Z

torchaudio/functional/functional.py

+                  | ``8`` is default and highest compression.
+                * | ``OGG/VORBIS``: number from ``-1`` to ``10``; ``-1`` is the highest compression
+                  | and lowest quality. Default: ``3``.
+            See the detail at http://sox.sourceforge.net/soxformat.html.


I am making updates to the original docstring of compression parameter. Instead of copy-pasting, can you redirect? Something like See :py:func:`torchaudio.backend.sox_io_backend.save`. should do.

Also, the order of arguments in docstring does not match with the actual order.

mthrok · 2021-02-12T04:32:06Z

test/torchaudio_unittest/functional/functional_cpu_test.py

+    def test_wave(self):
+        self._smoke_test("wav", compression=None, check_num_frames=True)
+
+    @parameterized.expand(list(itertools.product(


I think list(itertools.product( is redundant here.

mthrok · 2021-02-12T23:51:08Z

Hi @AzizCode92

I have added encoding and bits_per_sample options to save function. Can you also add the same parameters and pass them to apply_codec function?

AzizCode92 · 2021-02-13T09:54:39Z

Hi @mthrok,
Sure, I am just having issues with building torchaudio from source for the moment.
By the way, I opened an issue for that.
Once solved, I can continue working on my PR again

AzizCode92 · 2021-02-13T22:08:32Z

Hi @mthrok, I need some help here please regarding the torchscript_consistency_impl.py.
There is a RuntimeError caused by the torchaudio.backend.sox_io_backend.save.
Here is a copy of the Error message:

E           RuntimeError: 
E           
E           save(str filepath, Tensor src, int sample_rate, bool channels_first=True, float? compression=None, str? format=None, str? encoding=None, int? bits_per_sample=None) -> (None):
E           Expected a value of type 'str' for argument 'filepath' but instead found type 'Tensor'.
E           :
E             File "/root/project/env/lib/python3.7/site-packages/torchaudio-0.8.0a0+ec8326e-py3.7-linux-x86_64.egg/torchaudio/functional/functional.py", line 1035
E               """
E               bytes = io.BytesIO()
E               torchaudio.backend.sox_io_backend.save(bytes,
E               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E                                                      waveform,
E                                                      sample_rate,
E           'apply_codec' is being compiled since it was called from 'func'
E             File "/root/project/test/torchaudio_unittest/functional/torchscript_consistency_impl.py", line 559
E                       compression = None
E                       channels_first = True
E                       return F.apply_codec(tensor,
E                              ~~~~~~~~~~~~~~~~~~~~~
E                                            sample_rate,
E                                            ~~~~~~~~~~~~
E                                            format,
E                                            ~~~~~~~
E                                            compression,
E                                            ~~~~~~~~~~~~
E                                            channels_first)
E                                            ~~~~~~~~~~~~~~ <--- HERE

../env/lib/python3.7/site-packages/torch/jit/_script.py:995: RuntimeError

Any hints please?

mthrok · 2021-02-14T06:24:08Z

Hi @mthrok, I need some help here please regarding the torchscript_consistency_impl.py.
There is a RuntimeError caused by the torchaudio.backend.sox_io_backend.save.
Here is a copy of the Error message:

E           RuntimeError: 
E           
E           save(str filepath, Tensor src, int sample_rate, bool channels_first=True, float? compression=None, str? format=None, str? encoding=None, int? bits_per_sample=None) -> (None):
E           Expected a value of type 'str' for argument 'filepath' but instead found type 'Tensor'.
E           :
E             File "/root/project/env/lib/python3.7/site-packages/torchaudio-0.8.0a0+ec8326e-py3.7-linux-x86_64.egg/torchaudio/functional/functional.py", line 1035
E               """
E               bytes = io.BytesIO()
E               torchaudio.backend.sox_io_backend.save(bytes,
E               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E                                                      waveform,
E                                                      sample_rate,
E           'apply_codec' is being compiled since it was called from 'func'
E             File "/root/project/test/torchaudio_unittest/functional/torchscript_consistency_impl.py", line 559
E                       compression = None
E                       channels_first = True
E                       return F.apply_codec(tensor,
E                              ~~~~~~~~~~~~~~~~~~~~~
E                                            sample_rate,
E                                            ~~~~~~~~~~~~
E                                            format,
E                                            ~~~~~~~
E                                            compression,
E                                            ~~~~~~~~~~~~
E                                            channels_first)
E                                            ~~~~~~~~~~~~~~ <--- HERE

../env/lib/python3.7/site-packages/torch/jit/_script.py:995: RuntimeError

Any hints please?

Hi @AzizCode92

I am sorry, what I asked you was totally wrong. The said function is not compatible with TorchScript. (TorchScript compiler does not support Python-specific thing like file-like object). So adding TorchScript compatibility test is an impossible task. So sorry about that.

mthrok · 2021-02-14T06:26:25Z

torchaudio/functional/functional.py

+    compression: Optional[float] = None,
+    channels_first: bool = True,
+    encoding: Optional[str] = None,
+    bits_per_sample: Optional[int] = None,


Can you reorder the parameters as waveform, sample_rate, channels_first, format, encoding, bits_per_sample?

Sorry I am reverting what I originally asked (moving channels_first after format) but grouping format related parameters together looks nicer.

Moving the channels_first (default_parameter) before format (non default_parameter) throws a syntax-error SyntaxError: non-default argument follows default argument.
Also what about the compression parameter?

Sorry, I made an invalid suggestion again.
let's do waveform, sample_rate, format, channels_first, compression, encoding, bits_per_sample

mthrok

Please remove TorchScript compatibility test and re-order the parameters (and docstring). Sorry about the flip-flopping the direction.
Other than that, I think it's good.

AzizCode92 · 2021-02-15T09:38:18Z

@mthrok, no worries :)
Thanks a lot for your feedback.
I guess my PR is ready to be merged now.

mthrok

Looks good. Thank you for your contribution.
FYI, there is a good chance that we have to exclude this feature from the upcoming release because of the underlying instability in file-like object which needs attention. (#1229) I will try my best to hunt down the cause.

mthrok · 2021-02-15T15:18:48Z

Thanks!

* [iOS][GPU] Add iOS GPU workflow (pytorch#1200) * pt mobile script and optimize recipe (pytorch#1193) * pt mobile script and optimize recipe * 1 pt mobile new recipes summary and 5 recipes * updated recipes_index.rst * thumbnail png fix for ios recipe in recipes_index.rst * edits based on feedback * Updating 1.7 branch (pytorch#1205) * Update event tracking (pytorch#1188) * Update beginner_source/audio_preprocessing_tutorial.py (pytorch#1199) * Typo in beginner_source/audio_preprocessing_tutorial.py Typo in beginner_source/audio_preprocessing_tutorial.py fron > from * update title. * fix file access. Co-authored-by: JuHyuk Park <[email protected]> * Update audio_preprocessing_tutorial.py (pytorch#1202) Adds a comment for running this tutorial in Google Colab. Co-authored-by: Pat Mellon <[email protected]> Co-authored-by: Vincent QB <[email protected]> Co-authored-by: JuHyuk Park <[email protected]> Co-authored-by: Tao Xu <[email protected]> Co-authored-by: Jeff Tang <[email protected]> Co-authored-by: Pat Mellon <[email protected]> Co-authored-by: Vincent QB <[email protected]> Co-authored-by: JuHyuk Park <[email protected]>

mthrok · 2021-03-05T03:52:25Z

Hi @AzizCode92

Checkout the new tutorial which demonstrates how to use apply_codec function. Try it out with Colab.

https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html#data-augmentation

AzizCode92 · 2021-03-07T20:27:20Z

Hi @mthrok, Sure I will do it asap.
Thanks and I am very happy that this feature is out there.

add initial commit

3677399

facebook-github-bot added the CLA Signed label Jan 26, 2021

AzizCode92 changed the title ~~add initial commit~~ Apply codec-based data augmentation Jan 26, 2021

fix linting

9165a4d

AzizCode92 changed the title ~~Apply codec-based data augmentation~~ [WIP] Apply codec-based data augmentation Jan 26, 2021

AzizCode92 added 9 commits January 26, 2021 23:56

update unit-test

4767017

Revert "fix linting"

4f004a8

This reverts commit 9165a4d.

update decorator

329e28b

update unit-test

cd5f077

fix unit-test

5caf1a8

fix typo

cd2688e

fix windows unit-test

9cccf45

fix linting of the unit-test

93115b5

add docstring to apply_codec

f97241e

AzizCode92 changed the title ~~[WIP] Apply codec-based data augmentation~~ Apply codec-based data augmentation Jan 28, 2021

mthrok reviewed Jan 29, 2021

View reviewed changes

Update docs + unittest

fa56682

mthrok reviewed Feb 3, 2021

View reviewed changes

AzizCode92 added 7 commits February 4, 2021 22:11

Merge branch 'master' into codec_augmentation

9a7a575

feat: fix unit-test + update docs

f6f9b8d

feat: add test for torchScript consistency

f470175

Merge branch 'master' into codec_augmentation

c887d25

fix: remove torchscript consistency test

917dd63

Merge branch 'master' into codec_augmentation

7ccd317

feat: add apply_codec to consistency_impl

1576215

fix: fix linting issues

e8ace1d

mthrok reviewed Feb 12, 2021

View reviewed changes

Merge branch 'master' into codec_augmentation

ba9c4aa

feat: update docstring and unit-test

ec8326e

mthrok reviewed Feb 14, 2021

View reviewed changes

AzizCode92 added 2 commits February 14, 2021 20:19

fix: remove unecessary test

f858dd7

fix: update docstring

43b2e65

mthrok approved these changes Feb 15, 2021

View reviewed changes

mthrok merged commit 6854020 into pytorch:master Feb 15, 2021

mthrok mentioned this pull request Feb 25, 2021

Codec-based augmentation #1183

Closed



		@skipIfNoExec('sox')
		class ApplyCodecTestBase(TempDirMixin, TorchaudioTestCase):

		assert (num_masked_columns < mask_param).sum() == num_masked_columns.numel()


		@skipIfNoExec('sox')

		return (freqs * specgram).sum(dim=freq_dim) / specgram.sum(dim=freq_dim)


		def apply_codec(waveform, sample_rate, format, channels_first=True, compression=None) -> Tensor:

		return (freqs * specgram).sum(dim=freq_dim) / specgram.sum(dim=freq_dim)


		def apply_codec(waveform, sample_rate, compression, format, channels_first=True) -> Tensor:

Apply codec-based data augmentation #1200

Apply codec-based data augmentation #1200

Uh oh!

Conversation

AzizCode92 commented Jan 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AzizCode92 commented Jan 27, 2021

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AzizCode92 commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mthrok commented Feb 3, 2021

Uh oh!

mthrok commented Feb 3, 2021

Uh oh!

AzizCode92 commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mthrok commented Feb 12, 2021

Uh oh!

AzizCode92 commented Feb 13, 2021

Uh oh!

AzizCode92 commented Feb 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok commented Feb 14, 2021

Uh oh!

AzizCode92 commented Jan 26, 2021 •

edited

Loading

AzizCode92 commented Feb 2, 2021 •

edited

Loading

AzizCode92 commented Feb 11, 2021 •

edited

Loading

AzizCode92 commented Feb 13, 2021 •

edited

Loading