Skip to content

Adding encoding and bits_per_sample options to save #1258

@mthrok

Description

@mthrok

The current release version's "soundfile" backend's save function changes the encoding of the audio file based on the dtype of the provided Tensor. For example, if the dtype is "float32", then it will be saved as 32bit floating point PCM. This behavior was taken from SciPy's scipy.io.wavefile.write function. However it was pointed out that this is inconvenient for torchaudio users. Because most torchaudio's functionality works on float32 Tensor yet, the common audio formats typically retains only 16 bit, such as 16 bit signed integer PCM.

To resolve the inconvenience while keeping the functionality to support different encodings, we would like to add;

  1. Add encoding and bits_per_sample parameters to save function.
  2. For non-compressed format (such as "wav"), it defaults to 16-bit signed integer PCM. (This is BC-breaking behavior if users were dumping Tensor object without converting to the matching dtype)

See #1226 for the corresponding changes for "sox_io" backend. (but for "soundfile" backend the expected changes are much simpler)

Steps

  1. Add encoding and bits_per_sample options to save function of soundfile backend. Refer to the Add encoding and bits_per_sample option to save function #1226 for the specification (valid values, fallback values etc). Note that sound file does not support all the formats libsox does. (wav and flac are the ones that should be covered and match the behavior of "sox_io" backend as much as possible)
  2. Update the logic that determines "subtype" argument so that subtype is determined by format, encoding and bits_per_sample parameters. Note To learn how PySoundFile internally expresses audio format, see here
  3. Update the test
    1. Update the mocked test that checks what parameters are given to the underlying soundfile module. (Input parameter should be changed from dtype to encoding and bits_per_sample so that the logic added in step 2 is tested)
    2. Fix the reset of the test which will brake because for wav format the function will now default to 16bit PCM.

Build and test

Refer to CONTRIBUTING for the development setup.

To run the tests;

pytest test/torchaudio_unittest/backend/soundfile/save_test.py

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions