-
Notifications
You must be signed in to change notification settings - Fork 738
Apply codec-based data augmentation #1200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AzizCode92
Thanks for working on this. It is looking good.
torchaudio/functional/functional.py
Outdated
| Applies codecs as a form of augmentation | ||
| Args: | ||
| waveform (Tensor): Tensor of audio of dimension (..., time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description of the dimension is is not accurate as channels_first switch them.
Something like Audio data. Must be 2 dimensional. See also ``channels_first`` would do.
| from torchaudio._internal import ( | ||
| module_utils as _mod_utils, | ||
| ) | ||
| from torchaudio.backend import sox_io_backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it your IDE that is sorting the import statements?
My interpretation of PEP8 here is that torchaudio_unittest is the local library so it should be the third group and torch, torchaudio and other libraries used for testing are the second group.
So I prefer torchaudio comes before torchaudio_unittest. But if this is something your IDE does automatically, I won't argue.
https://www.python.org/dev/peps/pep-0008/#imports
Imports should be grouped in the following order:
- Standard library imports.
- Related third party imports.
- Local application/library specific imports.
You should put a blank line between each group of imports.
|
|
||
|
|
||
| @skipIfNoExec('sox') | ||
| class ApplyCodecTestBase(TempDirMixin, TorchaudioTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this to TestApplyCodec? one thing, this is not a base class, the second, TestXXX is the common pattern used in torchaudio's test suite.
| compression=compression) | ||
| save_wav(path, augmented_data, sample_rate) | ||
| info = sox_io_backend.info(path) | ||
| assert info.sample_rate == sample_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the data is saved with the given sample_rate. This assertion always passes.
Since the apply_codec works in-memory manner, it is unnecessary to save data on file system. (TempDirMixin is not necessary)
Since it is not simple to check if the resulting data are correct, I suggest that you check the shape (number of channels and conditionally number of frames).
def _smoke_test(self, format, compression, *, check_num_frames):
torch.random.manual_seed(42)
sample_rate = 8000
num_frames = 3.0 * sample_rate
num_channels = 2
waveform = torch.rand(num_channels, num_frames)
augmented = F.apply_codec(
waveform, sample_rate, format, channels_first=True, compression=compression)
assert augmented.dtype == waveform.dtype
assert augmented.shape[0] == num_channels
if check_num_frames:
assert augmented.shape[1] == num_framesthen for each formats,
def test_wave(self):
self._smoke_test("wav", compression=None, check_num_frames=True)
@parameterized.expand([96, ...])
def test_mp3(self, compression):
self._smoke_test("mp3", compression, check_num_frames=False)Note that WAVE is uncompressed format, so providing compression argument makes no difference.
Try to add other formats that are tested in save_test file, like "opus". They have different valid value range for compression argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added vorbis and flac. By adding opus i got RuntimeError: Unsupported file type: opus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also got an error inside the unit-test for the cases of mp3 files.
Something like this RuntimeError: Error loading audio file: failed to open file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I have forgotten that torchaudio can only read OPUS but cannot write.
| info = sox_io_backend.info(path) | ||
| assert info.sample_rate == sample_rate | ||
|
|
||
| @_mod_utils.requires_module('torchaudio._torchaudio') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use @skipIfNoExtension instead of @_mod_utils.requires_module, and you can put it on the class definition so that you do not need to do it on every method.
| assert (num_masked_columns < mask_param).sum() == num_masked_columns.numel() | ||
|
|
||
|
|
||
| @skipIfNoExec('sox') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sox command should not be required. This is a smoke test that does not require any external tool.
torchaudio/functional/functional.py
Outdated
| return (freqs * specgram).sum(dim=freq_dim) / specgram.sum(dim=freq_dim) | ||
|
|
||
|
|
||
| def apply_codec(waveform, sample_rate, format, channels_first=True, compression=None) -> Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reorder the arguments like waveform, sample_rate, compression=None, format, channels_first=True?
The expected usage will change the value for compression, while channels_first will not be changed (at all).
torchaudio/functional/functional.py
Outdated
| See the detail at http://sox.sourceforge.net/soxformat.html. | ||
| Returns: | ||
| Tensor: Dimension (..., time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above (..., time) is not accurate. Can you also mention that the number of frames might change for certain codecs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't get this comment correctly.
Since number of frames was not used inside this method, I didn't understand how to add it in the return statement inside the docstrings. Is it part of the shape of the output tensor? is our output tensor something like (..., num_frames)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the something similar to the docstring of I/O function. If channels_first=True, it has
[channel, time] else [time, channel].
torchaudio/functional/functional.py
Outdated
| cmn_window: int = 600, | ||
| min_cmn_window: int = 100, | ||
| center: bool = False, | ||
| norm_vars: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(In general, it is a common practice not to touch the part unrelated to the main goal of the PR. You might be asked to revert it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're totally right.
I think my IDE did it when I make code reformat.
|
Hi @mthrok, thanks a lot for your detailed feedback. |
torchaudio/functional/functional.py
Outdated
| return (freqs * specgram).sum(dim=freq_dim) / specgram.sum(dim=freq_dim) | ||
|
|
||
|
|
||
| def apply_codec(waveform, sample_rate, compression, format, channels_first=True) -> Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you annotate the signature so that it is TorchScript compatible?
- Can you reorder
compressionandformat?compressiondepends onformat. - Can you make compression parameter optional so that it defaults to
None(and update the docstring)? - Can you decorate the function with
requires_moduleso that this will be only available to torchaudio installation with C++ extension, as we discussed?
@_mod_utils.requires_module('torchaudio._torchaudio')
def apply_codec(
waveform: Tensor,
sample_rate: int,
format: str,
compression: Optional[float] = None,
channels_first: bool = True,
) -> Tensor:
torchaudio/functional/functional.py
Outdated
| Tensor | ||
| """ | ||
| bytes = io.BytesIO() | ||
| torchaudio.save(bytes, waveform, sample_rate, channels_first, compression=compression, format=format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that if the backend is set to soundfile it will use soundfile.
Can you use torchaudio.backend.sox_io_backend.save instead?
torchaudio/functional/functional.py
Outdated
| bytes = io.BytesIO() | ||
| torchaudio.save(bytes, waveform, sample_rate, channels_first, compression=compression, format=format) | ||
| bytes.seek(0) | ||
| waveform, _ = torchaudio.load(bytes, channels_first=channels_first) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that there are certain formats that has limitation on sample rate.
Can you replace load with sox_effects.apply_effects_file, and provide [["rate", f"{sample_rate}"]] so that they are re-sampled to original sample rate if necessary?
augmented, _ = torchaudio.sox_effect.apply_effects_file(
bytes, effects=[["rate", f"{sample_rate}"]], channels_first=channels_first, format=format)
return augmented
torchaudio/functional/functional.py
Outdated
| sample_rate (int): Sample rate of the audio waveform | ||
| format (str): file format | ||
| channels_first (bool): | ||
| When True, the returned Tensor has dimension ``[channel, time]``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, channels_first value is for both input and returned Tensor. It tells which format the input Tensor is, then return the resulting Tensor in the same manner.
|
If you have time, adding tests for TorchScript consistency (like this one) would be nice. |
|
Can you incorporate the changes from the latest master? I think the test failures will be fixed by #1181. |
|
Hi @mthrok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AzizCode92
Thanks for the work. Please refer to the comments. Once the tests pass, I think we can merge this.
| def func(tensor): | ||
| sample_rate = 8000, | ||
| format = "wav", | ||
| compression = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing commas here make the TorchScript compiler think that they are tuple type, thus the test is failing. Removing the commas should make the test work.
torchaudio/functional/functional.py
Outdated
| | ``8`` is default and highest compression. | ||
| * | ``OGG/VORBIS``: number from ``-1`` to ``10``; ``-1`` is the highest compression | ||
| | and lowest quality. Default: ``3``. | ||
| See the detail at http://sox.sourceforge.net/soxformat.html. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am making updates to the original docstring of compression parameter. Instead of copy-pasting, can you redirect? Something like See :py:func:`torchaudio.backend.sox_io_backend.save`. should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the order of arguments in docstring does not match with the actual order.
| def test_wave(self): | ||
| self._smoke_test("wav", compression=None, check_num_frames=True) | ||
|
|
||
| @parameterized.expand(list(itertools.product( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think list(itertools.product( is redundant here.
|
Hi @AzizCode92 I have added |
|
Hi @mthrok, I need some help here please regarding the Any hints please? |
Hi @AzizCode92 I am sorry, what I asked you was totally wrong. The said function is not compatible with TorchScript. (TorchScript compiler does not support Python-specific thing like file-like object). So adding TorchScript compatibility test is an impossible task. So sorry about that. |
| compression: Optional[float] = None, | ||
| channels_first: bool = True, | ||
| encoding: Optional[str] = None, | ||
| bits_per_sample: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reorder the parameters as waveform, sample_rate, channels_first, format, encoding, bits_per_sample?
Sorry I am reverting what I originally asked (moving channels_first after format) but grouping format related parameters together looks nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the channels_first (default_parameter) before format (non default_parameter) throws a syntax-error SyntaxError: non-default argument follows default argument.
Also what about the compression parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I made an invalid suggestion again.
let's do waveform, sample_rate, format, channels_first, compression, encoding, bits_per_sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove TorchScript compatibility test and re-order the parameters (and docstring). Sorry about the flip-flopping the direction.
Other than that, I think it's good.
|
@mthrok, no worries :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thank you for your contribution.
FYI, there is a good chance that we have to exclude this feature from the upcoming release because of the underlying instability in file-like object which needs attention. (#1229) I will try my best to hunt down the cause.
|
Thanks! |
* [iOS][GPU] Add iOS GPU workflow (pytorch#1200) * pt mobile script and optimize recipe (pytorch#1193) * pt mobile script and optimize recipe * 1 pt mobile new recipes summary and 5 recipes * updated recipes_index.rst * thumbnail png fix for ios recipe in recipes_index.rst * edits based on feedback * Updating 1.7 branch (pytorch#1205) * Update event tracking (pytorch#1188) * Update beginner_source/audio_preprocessing_tutorial.py (pytorch#1199) * Typo in beginner_source/audio_preprocessing_tutorial.py Typo in beginner_source/audio_preprocessing_tutorial.py fron > from * update title. * fix file access. Co-authored-by: JuHyuk Park <[email protected]> * Update audio_preprocessing_tutorial.py (pytorch#1202) Adds a comment for running this tutorial in Google Colab. Co-authored-by: Pat Mellon <[email protected]> Co-authored-by: Vincent QB <[email protected]> Co-authored-by: JuHyuk Park <[email protected]> Co-authored-by: Tao Xu <[email protected]> Co-authored-by: Jeff Tang <[email protected]> Co-authored-by: Pat Mellon <[email protected]> Co-authored-by: Vincent QB <[email protected]> Co-authored-by: JuHyuk Park <[email protected]>
|
Hi @AzizCode92 Checkout the new tutorial which demonstrates how to use https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html#data-augmentation |
|
Hi @mthrok, Sure I will do it asap. |
This PR addresses the issue #1183