Added GSM format support to save function #1271

imaginary-person · 2021-02-16T06:20:18Z

SUMMARY

Added the changes listed in #1264 towards supporting GSM format in the save function.

CAVEATS

http://sox.sourceforge.net/soxformat.html doesn't list various parameters for gsm, so only a generic statement regarding GSM not supporting custom encodings/bit depth/compression has been added in the documentation, but I can revise it further, if required.
I also made some changes in the code-structure for readability, with most of them aimed at conformance with the Google style guide for C++.
I even added curly braces to case blocks within switch statements, but I can roll back such changes, if required, as the Google style guide doesn't mandate one to do so.
I'm submitted this PR since the issue has a good first issue label, but if it's still being worked on by someone, please disregard this PR.

Thanks!

mthrok · 2021-02-16T18:42:55Z

Thanks for the contribution. Let me talk with the original contributor first. If you would like, you can try HTK format too. It should be practically the same process.

Fixed typo

imaginary-person · 2021-02-16T20:00:05Z

Hello @mthrok, SOX_ENCODING_HTK doesn't exist in the sox source-code. HTK encoding in sox actually uses SOX_ENCODING_SIGN2 under the hood, but since SOX_ENCODING_HTK can't simply be substituted with SOX_ENCODING_SIGN2, please advise if some sort of a workaround is possible. Basically, HTK is a format in sox but not an encoding. If I make Format::HTK correspond to SOX_ENCODING_SIGN2 in torchaudio/csrc/sox/utils.cpp, then the save tests for HTK fail:

save_test.py:250: in test_save_htk
    self.assert_save_consistency("htk", test_mode=test_mode)
save_test.py:140: in assert_save_consistency
    self.assertEqual(found, expected)
/home/pytorch/torch/testing/_internal/common_utils.py:1183: in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
E   AssertionError: False is not true : Tensors failed to compare as equal!Attempted to compare equality of tensors with different sizes. Got sizes torch.Size([1, 48000]) and torch.Size([1, 24000]).

Thank you!

Some tests unrelated to the change had failed.

mthrok · 2021-02-17T04:20:42Z

SOX_ENCODING_HTK doesn't exist in the sox source-code. HTK encoding in sox actually uses SOX_ENCODING_SIGN2 under the hood

HTK format uses signed-integer encoding. [code]

but since SOX_ENCODING_HTK can't simply be substituted with SOX_ENCODING_SIGN2

Why not? There are other formats that saves samples in raw format like wav, such as amb, sph. I expect that using SOX_ENCODING_SIGN2 should allow us to replicate the result sox produces.

If I make Format::HTK correspond to SOX_ENCODING_SIGN2 in torchaudio/csrc/sox/utils.cpp, then the save tests for HTK fail:

save_test.py:250: in test_save_htk
    self.assert_save_consistency("htk", test_mode=test_mode)
save_test.py:140: in assert_save_consistency
    self.assertEqual(found, expected)
/home/pytorch/torch/testing/_internal/common_utils.py:1183: in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
E   AssertionError: False is not true : Tensors failed to compare as equal!Attempted to compare equality of tensors with different sizes. Got sizes torch.Size([1, 48000]) and torch.Size([1, 24000]).

This is most likely the underlying encoder discards the channel information. http://sox.sourceforge.net/soxformat.html says,

.htk Single channel 16-bit PCM format used by HTK, a toolkit for building Hidden Markov Model speech processing tools.

Looking at the code, this format seems to have a specific header but the sample storage is same as WAVE format. Also the source code has the link for the references. (if you are interested in)

https://labrosa.ee.columbia.edu/doc/HTKBook21/node57.html
https://labrosa.ee.columbia.edu/doc/HTKBook21/node61.html
https://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html

Anyways, you can provide num_channels parameter to change the number of channels used in test. Can you try that?

imaginary-person · 2021-02-17T20:23:31Z

Thanks for your help, @mthrok. I hadn't noticed that the number of channels used in that test were 2 by default.
By providing a num_channels parameter, tests for htk also passed, as you had expected.
I'll submit a separate PR for htk, so as to retain the existing code-style (I changed style in some files of this PR).

Why not? There are other formats that saves samples in raw format like wav, such as amb, sph. I expect that using SOX_ENCODING_SIGN2 should allow us to replicate the result sox produces.

Yes, I just meant that I'd have to make some other change as well, such as the num_channels one you suggested.

imaginary-person · 2021-02-17T21:09:50Z

Closed this PR as the original contributor submitted another one.

The comma is problematic as reported in meta-pytorch/captum#553

Added gsm format

9b41ada

facebook-github-bot added the CLA Signed label Feb 16, 2021

imaginary-person added 3 commits February 16, 2021 00:30

Fix style

95e5540

Fix style again :/

a196aa5

Fix style yet again :/

dda85f5

Fix typo

932b453

Fixed typo

Remove newline to trigger CI

aadfc28

Some tests unrelated to the change had failed.

imaginary-person closed this Feb 17, 2021

imaginary-person mentioned this pull request Feb 17, 2021

Add HTK format support to sox_io's save & info #1276

Merged

mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021

Remove trailing comma (pytorch#1271)

5edc9f4

The comma is problematic as reported in meta-pytorch/captum#553

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added GSM format support to save function #1271

Added GSM format support to save function #1271

Uh oh!

imaginary-person commented Feb 16, 2021 •

edited

Loading

Uh oh!

mthrok commented Feb 16, 2021

Uh oh!

imaginary-person commented Feb 16, 2021 •

edited

Loading

Uh oh!

mthrok commented Feb 17, 2021

Uh oh!

imaginary-person commented Feb 17, 2021

Uh oh!

imaginary-person commented Feb 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added GSM format support to save function #1271

Added GSM format support to save function #1271

Uh oh!

Conversation

imaginary-person commented Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

CAVEATS

Uh oh!

mthrok commented Feb 16, 2021

Uh oh!

imaginary-person commented Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok commented Feb 17, 2021

Uh oh!

imaginary-person commented Feb 17, 2021

Uh oh!

imaginary-person commented Feb 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

imaginary-person commented Feb 16, 2021 •

edited

Loading

imaginary-person commented Feb 16, 2021 •

edited

Loading