Skip to content

Conversation

@AzizCode92
Copy link
Contributor

@AzizCode92 AzizCode92 commented Jan 26, 2021

This PR addresses the issue #1183

@AzizCode92 AzizCode92 changed the title add initial commit Apply codec-based data augmentation Jan 26, 2021
@AzizCode92 AzizCode92 changed the title Apply codec-based data augmentation [WIP] Apply codec-based data augmentation Jan 26, 2021
@AzizCode92
Copy link
Contributor Author

Hi @mthrok, this the baseline code I have made.
I apologize beforehand for accidentally closing the previous PR.

@AzizCode92 AzizCode92 changed the title [WIP] Apply codec-based data augmentation Apply codec-based data augmentation Jan 28, 2021
Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AzizCode92

Thanks for working on this. It is looking good.

Applies codecs as a form of augmentation
Args:
waveform (Tensor): Tensor of audio of dimension (..., time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the dimension is is not accurate as channels_first switch them.
Something like Audio data. Must be 2 dimensional. See also ``channels_first`` would do.

from torchaudio._internal import (
module_utils as _mod_utils,
)
from torchaudio.backend import sox_io_backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it your IDE that is sorting the import statements?
My interpretation of PEP8 here is that torchaudio_unittest is the local library so it should be the third group and torch, torchaudio and other libraries used for testing are the second group.
So I prefer torchaudio comes before torchaudio_unittest. But if this is something your IDE does automatically, I won't argue.

https://www.python.org/dev/peps/pep-0008/#imports

Imports should be grouped in the following order:

  1. Standard library imports.
  2. Related third party imports.
  3. Local application/library specific imports.
    You should put a blank line between each group of imports.



@skipIfNoExec('sox')
class ApplyCodecTestBase(TempDirMixin, TorchaudioTestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this to TestApplyCodec? one thing, this is not a base class, the second, TestXXX is the common pattern used in torchaudio's test suite.

compression=compression)
save_wav(path, augmented_data, sample_rate)
info = sox_io_backend.info(path)
assert info.sample_rate == sample_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the data is saved with the given sample_rate. This assertion always passes.
Since the apply_codec works in-memory manner, it is unnecessary to save data on file system. (TempDirMixin is not necessary)

Since it is not simple to check if the resulting data are correct, I suggest that you check the shape (number of channels and conditionally number of frames).

def _smoke_test(self, format, compression, *, check_num_frames):
    torch.random.manual_seed(42)
    sample_rate = 8000
    num_frames = 3.0 * sample_rate
    num_channels = 2
    waveform = torch.rand(num_channels, num_frames)
    augmented = F.apply_codec(
        waveform, sample_rate, format, channels_first=True, compression=compression)
    assert augmented.dtype == waveform.dtype
    assert augmented.shape[0] == num_channels
    if check_num_frames:
        assert augmented.shape[1] == num_frames

then for each formats,

def test_wave(self):
    self._smoke_test("wav", compression=None, check_num_frames=True)

@parameterized.expand([96, ...])
def test_mp3(self, compression):
    self._smoke_test("mp3", compression, check_num_frames=False)

Note that WAVE is uncompressed format, so providing compression argument makes no difference.

Try to add other formats that are tested in save_test file, like "opus". They have different valid value range for compression argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added vorbis and flac. By adding opus i got RuntimeError: Unsupported file type: opus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also got an error inside the unit-test for the cases of mp3 files.
Something like this RuntimeError: Error loading audio file: failed to open file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I have forgotten that torchaudio can only read OPUS but cannot write.

info = sox_io_backend.info(path)
assert info.sample_rate == sample_rate

@_mod_utils.requires_module('torchaudio._torchaudio')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use @skipIfNoExtension instead of @_mod_utils.requires_module, and you can put it on the class definition so that you do not need to do it on every method.

assert (num_masked_columns < mask_param).sum() == num_masked_columns.numel()


@skipIfNoExec('sox')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sox command should not be required. This is a smoke test that does not require any external tool.

return (freqs * specgram).sum(dim=freq_dim) / specgram.sum(dim=freq_dim)


def apply_codec(waveform, sample_rate, format, channels_first=True, compression=None) -> Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reorder the arguments like waveform, sample_rate, compression=None, format, channels_first=True?

The expected usage will change the value for compression, while channels_first will not be changed (at all).

See the detail at http://sox.sourceforge.net/soxformat.html.
Returns:
Tensor: Dimension (..., time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above (..., time) is not accurate. Can you also mention that the number of frames might change for certain codecs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get this comment correctly.
Since number of frames was not used inside this method, I didn't understand how to add it in the return statement inside the docstrings. Is it part of the shape of the output tensor? is our output tensor something like (..., num_frames)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the something similar to the docstring of I/O function. If channels_first=True, it has
[channel, time] else [time, channel].

cmn_window: int = 600,
min_cmn_window: int = 100,
center: bool = False,
norm_vars: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In general, it is a common practice not to touch the part unrelated to the main goal of the PR. You might be asked to revert it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're totally right.
I think my IDE did it when I make code reformat.

@AzizCode92
Copy link
Contributor Author

AzizCode92 commented Feb 2, 2021

Hi @mthrok, thanks a lot for your detailed feedback.
The tests are for the moment failing for the case of mp3 files.
Is it a bug related to how we load mp3 files using the sox_io_backend or something that I should adjust from my side?

return (freqs * specgram).sum(dim=freq_dim) / specgram.sum(dim=freq_dim)


def apply_codec(waveform, sample_rate, compression, format, channels_first=True) -> Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you annotate the signature so that it is TorchScript compatible?
  2. Can you reorder compression and format? compression depends on format.
  3. Can you make compression parameter optional so that it defaults to None (and update the docstring)?
  4. Can you decorate the function with requires_module so that this will be only available to torchaudio installation with C++ extension, as we discussed?
@_mod_utils.requires_module('torchaudio._torchaudio')
def apply_codec(
    waveform: Tensor,
    sample_rate: int,
    format: str,
    compression: Optional[float] = None,
    channels_first: bool = True,
) -> Tensor:

Tensor
"""
bytes = io.BytesIO()
torchaudio.save(bytes, waveform, sample_rate, channels_first, compression=compression, format=format)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that if the backend is set to soundfile it will use soundfile.
Can you use torchaudio.backend.sox_io_backend.save instead?

bytes = io.BytesIO()
torchaudio.save(bytes, waveform, sample_rate, channels_first, compression=compression, format=format)
bytes.seek(0)
waveform, _ = torchaudio.load(bytes, channels_first=channels_first)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that there are certain formats that has limitation on sample rate.

Can you replace load with sox_effects.apply_effects_file, and provide [["rate", f"{sample_rate}"]] so that they are re-sampled to original sample rate if necessary?

augmented, _ = torchaudio.sox_effect.apply_effects_file(
   bytes, effects=[["rate", f"{sample_rate}"]], channels_first=channels_first, format=format)
return augmented

sample_rate (int): Sample rate of the audio waveform
format (str): file format
channels_first (bool):
When True, the returned Tensor has dimension ``[channel, time]``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, channels_first value is for both input and returned Tensor. It tells which format the input Tensor is, then return the resulting Tensor in the same manner.

@mthrok
Copy link
Contributor

mthrok commented Feb 3, 2021

If you have time, adding tests for TorchScript consistency (like this one) would be nice.

@mthrok
Copy link
Contributor

mthrok commented Feb 3, 2021

Can you incorporate the changes from the latest master? I think the test failures will be fixed by #1181.

@AzizCode92
Copy link
Contributor Author

AzizCode92 commented Feb 11, 2021

Hi @mthrok.
The unit-test for apply_codec are green now.
I have added also a test for TorchScript consistency.
The PR is red due to a RuntimeError of the TorchScript consistency tests.

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AzizCode92

Thanks for the work. Please refer to the comments. Once the tests pass, I think we can merge this.

def func(tensor):
sample_rate = 8000,
format = "wav",
compression = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing commas here make the TorchScript compiler think that they are tuple type, thus the test is failing. Removing the commas should make the test work.

| ``8`` is default and highest compression.
* | ``OGG/VORBIS``: number from ``-1`` to ``10``; ``-1`` is the highest compression
| and lowest quality. Default: ``3``.
See the detail at http://sox.sourceforge.net/soxformat.html.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am making updates to the original docstring of compression parameter. Instead of copy-pasting, can you redirect? Something like See :py:func:`torchaudio.backend.sox_io_backend.save`. should do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the order of arguments in docstring does not match with the actual order.

def test_wave(self):
self._smoke_test("wav", compression=None, check_num_frames=True)

@parameterized.expand(list(itertools.product(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think list(itertools.product( is redundant here.

@mthrok
Copy link
Contributor

mthrok commented Feb 12, 2021

Hi @AzizCode92

I have added encoding and bits_per_sample options to save function. Can you also add the same parameters and pass them to apply_codec function?

@AzizCode92
Copy link
Contributor Author

Hi @mthrok,
Sure, I am just having issues with building torchaudio from source for the moment.
By the way, I opened an issue for that.
Once solved, I can continue working on my PR again

@AzizCode92
Copy link
Contributor Author

AzizCode92 commented Feb 13, 2021

Hi @mthrok, I need some help here please regarding the torchscript_consistency_impl.py.
There is a RuntimeError caused by the torchaudio.backend.sox_io_backend.save.
Here is a copy of the Error message:

E           RuntimeError: 
E           
E           save(str filepath, Tensor src, int sample_rate, bool channels_first=True, float? compression=None, str? format=None, str? encoding=None, int? bits_per_sample=None) -> (None):
E           Expected a value of type 'str' for argument 'filepath' but instead found type 'Tensor'.
E           :
E             File "/root/project/env/lib/python3.7/site-packages/torchaudio-0.8.0a0+ec8326e-py3.7-linux-x86_64.egg/torchaudio/functional/functional.py", line 1035
E               """
E               bytes = io.BytesIO()
E               torchaudio.backend.sox_io_backend.save(bytes,
E               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E                                                      waveform,
E                                                      sample_rate,
E           'apply_codec' is being compiled since it was called from 'func'
E             File "/root/project/test/torchaudio_unittest/functional/torchscript_consistency_impl.py", line 559
E                       compression = None
E                       channels_first = True
E                       return F.apply_codec(tensor,
E                              ~~~~~~~~~~~~~~~~~~~~~
E                                            sample_rate,
E                                            ~~~~~~~~~~~~
E                                            format,
E                                            ~~~~~~~
E                                            compression,
E                                            ~~~~~~~~~~~~
E                                            channels_first)
E                                            ~~~~~~~~~~~~~~ <--- HERE

../env/lib/python3.7/site-packages/torch/jit/_script.py:995: RuntimeError

Any hints please?

@mthrok
Copy link
Contributor

mthrok commented Feb 14, 2021

Hi @mthrok, I need some help here please regarding the torchscript_consistency_impl.py.
There is a RuntimeError caused by the torchaudio.backend.sox_io_backend.save.
Here is a copy of the Error message:

E           RuntimeError: 
E           
E           save(str filepath, Tensor src, int sample_rate, bool channels_first=True, float? compression=None, str? format=None, str? encoding=None, int? bits_per_sample=None) -> (None):
E           Expected a value of type 'str' for argument 'filepath' but instead found type 'Tensor'.
E           :
E             File "/root/project/env/lib/python3.7/site-packages/torchaudio-0.8.0a0+ec8326e-py3.7-linux-x86_64.egg/torchaudio/functional/functional.py", line 1035
E               """
E               bytes = io.BytesIO()
E               torchaudio.backend.sox_io_backend.save(bytes,
E               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E                                                      waveform,
E                                                      sample_rate,
E           'apply_codec' is being compiled since it was called from 'func'
E             File "/root/project/test/torchaudio_unittest/functional/torchscript_consistency_impl.py", line 559
E                       compression = None
E                       channels_first = True
E                       return F.apply_codec(tensor,
E                              ~~~~~~~~~~~~~~~~~~~~~
E                                            sample_rate,
E                                            ~~~~~~~~~~~~
E                                            format,
E                                            ~~~~~~~
E                                            compression,
E                                            ~~~~~~~~~~~~
E                                            channels_first)
E                                            ~~~~~~~~~~~~~~ <--- HERE

../env/lib/python3.7/site-packages/torch/jit/_script.py:995: RuntimeError

Any hints please?

Hi @AzizCode92

I am sorry, what I asked you was totally wrong. The said function is not compatible with TorchScript. (TorchScript compiler does not support Python-specific thing like file-like object). So adding TorchScript compatibility test is an impossible task. So sorry about that.

compression: Optional[float] = None,
channels_first: bool = True,
encoding: Optional[str] = None,
bits_per_sample: Optional[int] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reorder the parameters as waveform, sample_rate, channels_first, format, encoding, bits_per_sample?

Sorry I am reverting what I originally asked (moving channels_first after format) but grouping format related parameters together looks nicer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the channels_first (default_parameter) before format (non default_parameter) throws a syntax-error SyntaxError: non-default argument follows default argument.
Also what about the compression parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made an invalid suggestion again.
let's do waveform, sample_rate, format, channels_first, compression, encoding, bits_per_sample

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove TorchScript compatibility test and re-order the parameters (and docstring). Sorry about the flip-flopping the direction.
Other than that, I think it's good.

@AzizCode92
Copy link
Contributor Author

@mthrok, no worries :)
Thanks a lot for your feedback.
I guess my PR is ready to be merged now.

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you for your contribution.
FYI, there is a good chance that we have to exclude this feature from the upcoming release because of the underlying instability in file-like object which needs attention. (#1229) I will try my best to hunt down the cause.

@mthrok mthrok merged commit 6854020 into pytorch:master Feb 15, 2021
@mthrok
Copy link
Contributor

mthrok commented Feb 15, 2021

Thanks!

@mthrok mthrok mentioned this pull request Feb 25, 2021
mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021
* [iOS][GPU] Add iOS GPU workflow (pytorch#1200)

* pt mobile script and optimize recipe (pytorch#1193)

* pt mobile script and optimize recipe

* 1 pt mobile new recipes summary and 5 recipes

* updated recipes_index.rst

* thumbnail png fix for ios recipe in recipes_index.rst

* edits based on feedback

* Updating 1.7 branch (pytorch#1205)

* Update event tracking (pytorch#1188)

* Update beginner_source/audio_preprocessing_tutorial.py (pytorch#1199)

* Typo in beginner_source/audio_preprocessing_tutorial.py

Typo in beginner_source/audio_preprocessing_tutorial.py

fron > from

* update title.

* fix file access.

Co-authored-by: JuHyuk Park <[email protected]>

* Update audio_preprocessing_tutorial.py (pytorch#1202)

Adds a comment for running this tutorial in Google Colab.

Co-authored-by: Pat Mellon <[email protected]>
Co-authored-by: Vincent QB <[email protected]>
Co-authored-by: JuHyuk Park <[email protected]>

Co-authored-by: Tao Xu <[email protected]>
Co-authored-by: Jeff Tang <[email protected]>
Co-authored-by: Pat Mellon <[email protected]>
Co-authored-by: Vincent QB <[email protected]>
Co-authored-by: JuHyuk Park <[email protected]>
@mthrok
Copy link
Contributor

mthrok commented Mar 5, 2021

Hi @AzizCode92

Checkout the new tutorial which demonstrates how to use apply_codec function. Try it out with Colab.

https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html#data-augmentation

@AzizCode92
Copy link
Contributor Author

Hi @mthrok, Sure I will do it asap.
Thanks and I am very happy that this feature is out there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants