Codec-based augmentation

With the support for file-like object in I/O function, torchaudio can apply codecs as a form of augmentation.
There are two components required to add such feature to torchaudio.

1. Python frontend
    The user-facing function.
2. Extend format for `save` function.
    Currently the audio format that "sox_io" save function supports is limited to certain formats. (wave, mp3, flac, opus, amb, amr-nb, sphere, ogg/vorbis), while the underlying libsox can handle more formats. We need to extend the supported formats.

We welcome the open source community contribution. If you are interested in working on this, please read the following description and leave a comment on which part you would like to work, so other people would not do a duplicated work. If you are interested in working on extended format support on `save` function, open one PR for one format.

## 1. Interface

### Implementation

The gist of the Python frontend looks like this. Save the given waveform in memory with codec and compression applied, then load it.

```python
# in torchaudio/functional/functional.py
import io


def apply_codec(waveform, sample_rate, format, channels_first=True, compression=None):
    bytes = io.BytesIO()
    torchaudio.save(bytes, waveform, sample_rate, channels_first, compression=compression, format=format)
    bytes.seek(0)
    waveform, _ = torchaudio.load(bytes, channels_first=channels_first)
    return waveform
```

### Testing

For a starter, we need a smoke test that verifies that the function does not crush for a variety of formats. Since we want this function to be able to work in Windows system, where libsox is not available, we need to test for "sox_io" backend and "soundfile" backend with new interface. Therefor, adding a base class where the test logic is implemented, and extending the class for "sox_io" backend and "soundfile" backend would do.

```python
class ApplyCodecTestBase:
    def test_codec(self, format, compression):
        # run the function 
        # maybe check the channels (number of frames can change depending on format like mp3)

class ApplyCodecSoxIOTest(ApplyCodecTestBase):
    backend = "sox_io"
    # parameterize the compression
    def test_mp3(self, compression):
        self.test_codec("mp3", compression)

class ApplyCodecSoundfileTest(ApplyCodecTestBase):
    backend = "soundfile-new"
    # similar to the above
```

For the detail of what parameters to be parameterized and how, see  the existing test;
https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/test/torchaudio_unittest/sox_io_backend/info_test.py#L56-L153

## 2. Extend format support for `save` function

### Implementation
The save function is implemented in C++ backend. The following is the call stack.

`torchaudio.save`
 -> [`torchaudio.backend.sox_io_backend.save`](https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/torchaudio/backend/sox_io_backend.py#L156-L215)
 -> [`torchaudio.backend.sox_io_backend._save`](https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/torchaudio/backend/sox_io_backend.py#L137-L153)
 -> [`torchaudio/csrc/sox/io.cpp.cpp::save_audio_fileobj`](https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/torchaudio/csrc/sox/io.cpp#L176-L222)

Inside of `save_audio_fileobj` function, The target signal/encoding are determined by [`get_signalinfo`](https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/torchaudio/csrc/sox/utils.cpp#L279-L289) and [`get_encodinginfo`](https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/torchaudio/csrc/sox/utils.cpp#L303-L314) functions. These functions call [`get_encoding` function](https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/torchaudio/csrc/sox/utils.cpp#L221-L246) to find the corresponding [`sox_encoding_t` type](https://fossies.org/dox/sox-14.4.2/sox_8h.html#ab0a659b0258d1989c751ba5415e6a4cd) and [`get_precision` function](https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/torchaudio/csrc/sox/utils.cpp#L248-L277)

So as to extend format support, we need to add the correct mapping from format string to `sox_encoding_t` in `get_encoding` and the mapping from format string bit depth in `get_precision` function.

### Testing

Checkout [this](https://github.com/pytorch/audio/blob/f1d8d1e0da44f6503c01ddcae2e40772b400ea2d/test/torchaudio_unittest/sox_io_backend/save_test.py#L32-L84) for how the correctness of save function is currently tested. The corresponding test method can be found [here](https://github.com/pytorch/audio/blob/master/test/torchaudio_unittest/sox_io_backend/save_test.py#L298-L305).

Add a similar test with necessary parameterization. Note that certain formats have specific restriction (for example, SPHERE and flac format are 24 bit).

### Building and testing locally

To work on this, `torchaudio` needs to be built from source. Use of `conda` environment (anaconda/miniconda) is highly recommended.

Also, build requires `cmake` and nightly build version of PyTorch. Refer to pytorch.org for the installation.
To install `cmake`, do `pip install cmake`.

Once the environment is setup, the following command will build and run the corresponding tests

```bash
BUILD_SOX=1 python setup.py develop
(cd test && pytest torchaudio_unittest/sox_io_backend/save_test.py -v)
```

	@parameterized.expand(list(itertools.product(
	[8000, 16000],
	[1, 2],
	[96, 128, 160, 192, 224, 256, 320],
	)), name_func=name_func)
	def test_mp3(self, sample_rate, num_channels, bit_rate):
	"""`sox_io_backend.info` can check mp3 file correctly"""
	duration = 1
	path = self.get_temp_path('data.mp3')
	sox_utils.gen_audio_file(
	path, sample_rate, num_channels,
	compression=bit_rate, duration=duration,
	)
	info = sox_io_backend.info(path)
	assert info.sample_rate == sample_rate
	# mp3 does not preserve the number of samples
	# assert info.num_frames == sample_rate * duration
	assert info.num_channels == num_channels

	@parameterized.expand(list(itertools.product(
	[8000, 16000],
	[1, 2],
	list(range(9)),
	)), name_func=name_func)
	def test_flac(self, sample_rate, num_channels, compression_level):
	"""`sox_io_backend.info` can check flac file correctly"""
	duration = 1
	path = self.get_temp_path('data.flac')
	sox_utils.gen_audio_file(
	path, sample_rate, num_channels,
	compression=compression_level, duration=duration,
	)
	info = sox_io_backend.info(path)
	assert info.sample_rate == sample_rate
	assert info.num_frames == sample_rate * duration
	assert info.num_channels == num_channels

	@parameterized.expand(list(itertools.product(
	[8000, 16000],
	[1, 2],
	[-1, 0, 1, 2, 3, 3.6, 5, 10],
	)), name_func=name_func)
	def test_vorbis(self, sample_rate, num_channels, quality_level):
	"""`sox_io_backend.info` can check vorbis file correctly"""
	duration = 1
	path = self.get_temp_path('data.vorbis')
	sox_utils.gen_audio_file(
	path, sample_rate, num_channels,
	compression=quality_level, duration=duration,
	)
	info = sox_io_backend.info(path)
	assert info.sample_rate == sample_rate
	assert info.num_frames == sample_rate * duration
	assert info.num_channels == num_channels

	@parameterized.expand(list(itertools.product(
	[8000, 16000],
	[1, 2],
	)), name_func=name_func)
	def test_sphere(self, sample_rate, num_channels):
	"""`sox_io_backend.info` can check sph file correctly"""
	duration = 1
	path = self.get_temp_path('data.sph')
	sox_utils.gen_audio_file(path, sample_rate, num_channels, duration=duration)
	info = sox_io_backend.info(path)
	assert info.sample_rate == sample_rate
	assert info.num_frames == sample_rate * duration
	assert info.num_channels == num_channels

	@parameterized.expand(list(itertools.product(
	['float32', 'int32', 'int16', 'uint8'],
	[8000, 16000],
	[1, 2],
	)), name_func=name_func)
	def test_amb(self, dtype, sample_rate, num_channels):
	"""`sox_io_backend.info` can check amb file correctly"""
	duration = 1
	path = self.get_temp_path('data.amb')
	sox_utils.gen_audio_file(
	path, sample_rate, num_channels,
	bit_depth=sox_utils.get_bit_depth(dtype), duration=duration)
	info = sox_io_backend.info(path)
	assert info.sample_rate == sample_rate
	assert info.num_frames == sample_rate * duration
	assert info.num_channels == num_channels

	def test_amr_nb(self):
	"""`sox_io_backend.info` can check amr-nb file correctly"""
	duration = 1
	num_channels = 1
	sample_rate = 8000
	path = self.get_temp_path('data.amr-nb')
	sox_utils.gen_audio_file(
	path, sample_rate=sample_rate, num_channels=num_channels, bit_depth=16, duration=duration)
	info = sox_io_backend.info(path)
	assert info.sample_rate == sample_rate
	assert info.num_frames == sample_rate * duration
	assert info.num_channels == num_channels

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Codec-based augmentation #1183

1. Interface

Implementation

Testing

2. Extend format support for `save` function

Implementation

Testing

Building and testing locally

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Codec-based augmentation #1183

Description

1. Interface

Implementation

Testing

2. Extend format support for save function

Implementation

Testing

Building and testing locally

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Extend format support for `save` function