RFC: Applying codecs as data augmentation


In #1108 and #1141, I am adding in-memory decoding and encoding. This allows us to apply codecs to audio Tensor, like the following way.

```python
fileobj = io.BytesIO()
torchaudio.save(fileobj, waveform, …, format=”mp3”, compression=9)
fileobj.seek(0)
waveform, _ = torchaudio.load(fileobj)
# Note: depending on the format, the size of the tensor could be different,
# so some post processing might be necessary
```

Which practically gives the same result as 

```shell
sox input.wav -C 9 temp.mp3
sox temp.mp3 output.wav
```

I am thinking of adding this codecs application as part of torchaudio’s feature. Before starting working on API specification and engineering load map, I would like to hear from the community what kind of feature would be helpful to your use case.

If you would like to use codecs as data augmentation or if there are papers that use this kind of technique. Please leave comment.

cc @mravanelli @sw005320 @pzelasko @faroit @mpariente @danpovey 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: Applying codecs as data augmentation #1146

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC: Applying codecs as data augmentation #1146

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions