Skip to content

RFC: Applying codecs as data augmentation #1146

@mthrok

Description

@mthrok

In #1108 and #1141, I am adding in-memory decoding and encoding. This allows us to apply codecs to audio Tensor, like the following way.

fileobj = io.BytesIO()
torchaudio.save(fileobj, waveform, …, format=mp3”, compression=9)
fileobj.seek(0)
waveform, _ = torchaudio.load(fileobj)
# Note: depending on the format, the size of the tensor could be different,
# so some post processing might be necessary

Which practically gives the same result as

sox input.wav -C 9 temp.mp3
sox temp.mp3 output.wav

I am thinking of adding this codecs application as part of torchaudio’s feature. Before starting working on API specification and engineering load map, I would like to hear from the community what kind of feature would be helpful to your use case.

If you would like to use codecs as data augmentation or if there are papers that use this kind of technique. Please leave comment.

cc @mravanelli @sw005320 @pzelasko @faroit @mpariente @danpovey

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions