-
Couldn't load subscription status.
- Fork 734
Description
The current release version's "soundfile" backend's save function changes the encoding of the audio file based on the dtype of the provided Tensor. For example, if the dtype is "float32", then it will be saved as 32bit floating point PCM. This behavior was taken from SciPy's scipy.io.wavefile.write function. However it was pointed out that this is inconvenient for torchaudio users. Because most torchaudio's functionality works on float32 Tensor yet, the common audio formats typically retains only 16 bit, such as 16 bit signed integer PCM.
To resolve the inconvenience while keeping the functionality to support different encodings, we would like to add;
- Add
encodingandbits_per_sampleparameters tosavefunction. For non-compressed format (such as "wav"), it defaults to 16-bit signed integer PCM. (This is BC-breaking behavior if users were dumping Tensor object without converting to the matching dtype)
See #1226 for the corresponding changes for "sox_io" backend. (but for "soundfile" backend the expected changes are much simpler)
Steps
- Add
encodingandbits_per_sampleoptions tosavefunction of soundfile backend. Refer to the Add encoding and bits_per_sample option to save function #1226 for the specification (valid values, fallback values etc). Note that sound file does not support all the formatslibsoxdoes. (wavandflacare the ones that should be covered and match the behavior of"sox_io"backend as much as possible) - Update the logic that determines "subtype" argument so that
subtypeis determined byformat,encodingandbits_per_sampleparameters. Note To learn how PySoundFile internally expresses audio format, see here - Update the test
- Update the mocked test that checks what parameters are given to the underlying
soundfilemodule. (Input parameter should be changed fromdtypetoencodingandbits_per_sampleso that the logic added in step 2 is tested) - Fix the reset of the test which will brake because for wav format the function will now default to 16bit PCM.
- Update the mocked test that checks what parameters are given to the underlying
Build and test
Refer to CONTRIBUTING for the development setup.
To run the tests;
pytest test/torchaudio_unittest/backend/soundfile/save_test.py