Support file-like object in load function #1158

mthrok · 2021-01-06T18:17:05Z

This PR adds file-like object support to load function.

From stdin

import sys
import torchaudio
data, sr = torchaudio.load(sys.stdin.buffer)

From an archive without extracting it in the local filesystem

import tarfile
import torchaudio

with tarfile.TarFile(archive_path, 'r') as tarobj:
    data, sample_rate = torchaudio.load(tarobj.extractfile(member_name))

Streaming over the network
With requests

import requests
import torchaudio

with requests.get(url, stream=True) as resp:
    data, sample_rate = torchaudio.load(resp.raw)

With boto3

import boto3

s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=key)
data, sample_rate = torchaudio.load(obj['Body'])

For sox_io backend, the new loading function load_audio_fileobj, which works on any Python object with read method that returns bytes. It is added and bound via PyBind11.

For soundfile backend, pysoundfile natively supports file-like object

closes #754 #800
See als #1115

torchaudio/backend/sox_io_backend.py

torchaudio/csrc/sox/effects.cpp

torchaudio/csrc/sox/effects_chain.cpp

torchaudio/csrc/sox/utils.cpp

torchaudio/sox_effects/sox_effects.py

rbracco · 2021-01-22T19:03:02Z

Thank you, this is an awesome and much needed feature. Your work is appreciated!

mthrok · 2021-01-22T21:16:10Z

Thank you, this is an awesome and much needed feature. Your work is appreciated!

Hi @rbracco

Thanks :) . Please try it out and give feedback.
Please consider this as prototype feature.
I am observing sporadic test failure in our CI.
I am trying to figure it out what are the conditions that are causing the failure.

https://app.circleci.com/pipelines/github/pytorch/audio/4658/workflows/c810196e-4e45-4d27-8089-9844eb19adf6/jobs/148952

rbracco · 2021-01-28T14:27:05Z

I have tried it out by sending audio from a React-based frontend to a fastapi backend that receives a file-like object as an upload and then reads it into torchaudio. It worked great with wav files, but when I tried to simplify by moving to mp3 (smaller, more efficient) I started getting RuntimeError: Error loading audio file: failed to open file. in load.

All of the following work:
-Loading a normal MP3 in torchaudio (on the backend server)
-Downloading the MP3 generated by my frontend and opening/playing in an audio player.
-Sending the blob to backend server, writing it to a .mp3 file, and then loading it in torchaudio as a normal file (code below)

For this reason I think it is likely an issue with the new file-like object loading, but unfortunately I have very shallow knowledge of audio formats, C++, and how this works under the hood. I'm happy to keep helping with testing though, please let me know if you have any ideas for a fix, or if this is worthy of a separate issue. Thanks again, I'm really excited for this feature and I'm in an audio ML chat on telegram with many others who feel the same.

with open("audios/example.mp3", "wb") as f:
        f.write(file.file.read())
    y, sr = torchaudio.load("audios/example.mp3")

mthrok · 2021-01-28T16:04:11Z

Hi @rbracco

Thanks for trying out and the feedback. I am happy to hear that this addition is helpful for you.

For your issue,

I merged Fix load from file object for small files and shorter bytes #1181 for some tweak. Not sure which nightly you tried but if you tested with yesterday's nightly, can you try again with today's nightly?
Can you try with format="mp3"? For MP3, libsox does not detect it from the header, so it needs to be told.

If you still see an issue, please open an issue. I am more than happy to look at it.
I also want to polish this feature but I am seeing test failure that happens sporadically (like this) and trying to figure out the cause.

Ever since the file-like object support was added in pytorch#1158, the test was occasionally failing in CI. This PR fixes this.

* Fix fileobj I/O undeterministic behavior Ever since the file-like object support was added in #1158, the test was occasionally failing in CI. This PR fixes this.

* Fix fileobj I/O undeterministic behavior Ever since the file-like object support was added in pytorch#1158, the test was occasionally failing in CI. This PR fixes this.

* Create distributed_rpc_profiling.rst * Update recipes_index.rst * Add files via upload * Update recipes_index.rst

facebook-github-bot added the CLA Signed label Jan 6, 2021

mthrok requested a review from cpuhrsch January 6, 2021 18:17