Skip to content

Conversation

@imaginary-person
Copy link
Contributor

@imaginary-person imaginary-person commented Feb 16, 2021

SUMMARY

Added the changes listed in #1264 towards supporting GSM format in the save function.

CAVEATS

  1. http://sox.sourceforge.net/soxformat.html doesn't list various parameters for gsm, so only a generic statement regarding GSM not supporting custom encodings/bit depth/compression has been added in the documentation, but I can revise it further, if required.

  2. I also made some changes in the code-structure for readability, with most of them aimed at conformance with the Google style guide for C++.
    I even added curly braces to case blocks within switch statements, but I can roll back such changes, if required, as the Google style guide doesn't mandate one to do so.

  3. I'm submitted this PR since the issue has a good first issue label, but if it's still being worked on by someone, please disregard this PR.

Thanks!

@mthrok
Copy link
Contributor

mthrok commented Feb 16, 2021

Hi @imaginary-person

Thanks for the contribution. Let me talk with the original contributor first. If you would like, you can try HTK format too. It should be practically the same process.

Fixed typo
@imaginary-person
Copy link
Contributor Author

imaginary-person commented Feb 16, 2021

Hello @mthrok, SOX_ENCODING_HTK doesn't exist in the sox source-code. HTK encoding in sox actually uses SOX_ENCODING_SIGN2 under the hood, but since SOX_ENCODING_HTK can't simply be substituted with SOX_ENCODING_SIGN2, please advise if some sort of a workaround is possible. Basically, HTK is a format in sox but not an encoding. If I make Format::HTK correspond to SOX_ENCODING_SIGN2 in torchaudio/csrc/sox/utils.cpp, then the save tests for HTK fail:

save_test.py:250: in test_save_htk
    self.assert_save_consistency("htk", test_mode=test_mode)
save_test.py:140: in assert_save_consistency
    self.assertEqual(found, expected)
/home/pytorch/torch/testing/_internal/common_utils.py:1183: in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
E   AssertionError: False is not true : Tensors failed to compare as equal!Attempted to compare equality of tensors with different sizes. Got sizes torch.Size([1, 48000]) and torch.Size([1, 24000]).

Thank you!

Some tests unrelated to the change had failed.
@mthrok
Copy link
Contributor

mthrok commented Feb 17, 2021

SOX_ENCODING_HTK doesn't exist in the sox source-code. HTK encoding in sox actually uses SOX_ENCODING_SIGN2 under the hood

HTK format uses signed-integer encoding. [code]

but since SOX_ENCODING_HTK can't simply be substituted with SOX_ENCODING_SIGN2

Why not? There are other formats that saves samples in raw format like wav, such as amb, sph. I expect that using SOX_ENCODING_SIGN2 should allow us to replicate the result sox produces.

If I make Format::HTK correspond to SOX_ENCODING_SIGN2 in torchaudio/csrc/sox/utils.cpp, then the save tests for HTK fail:

save_test.py:250: in test_save_htk
    self.assert_save_consistency("htk", test_mode=test_mode)
save_test.py:140: in assert_save_consistency
    self.assertEqual(found, expected)
/home/pytorch/torch/testing/_internal/common_utils.py:1183: in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
E   AssertionError: False is not true : Tensors failed to compare as equal!Attempted to compare equality of tensors with different sizes. Got sizes torch.Size([1, 48000]) and torch.Size([1, 24000]).

This is most likely the underlying encoder discards the channel information. http://sox.sourceforge.net/soxformat.html says,

.htk Single channel 16-bit PCM format used by HTK, a toolkit for building Hidden Markov Model speech processing tools.

Looking at the code, this format seems to have a specific header but the sample storage is same as WAVE format. Also the source code has the link for the references. (if you are interested in)

https://labrosa.ee.columbia.edu/doc/HTKBook21/node57.html
https://labrosa.ee.columbia.edu/doc/HTKBook21/node61.html
https://labrosa.ee.columbia.edu/doc/HTKBook21/node58.html

Anyways, you can provide num_channels parameter to change the number of channels used in test. Can you try that?

@imaginary-person
Copy link
Contributor Author

Thanks for your help, @mthrok. I hadn't noticed that the number of channels used in that test were 2 by default.
By providing a num_channels parameter, tests for htk also passed, as you had expected.
I'll submit a separate PR for htk, so as to retain the existing code-style (I changed style in some files of this PR).

Why not? There are other formats that saves samples in raw format like wav, such as amb, sph. I expect that using SOX_ENCODING_SIGN2 should allow us to replicate the result sox produces.

Yes, I just meant that I'd have to make some other change as well, such as the num_channels one you suggested.

@imaginary-person
Copy link
Contributor Author

Closed this PR as the original contributor submitted another one.

mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021
The comma is problematic as reported in meta-pytorch/captum#553
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants