Skip to content

Conversation

@mthrok
Copy link
Contributor

@mthrok mthrok commented Jan 6, 2021

This PR adds file-like object support to load function.

  • From stdin
    import sys
    import torchaudio
    data, sr = torchaudio.load(sys.stdin.buffer)
  • From an archive without extracting it in the local filesystem
    import tarfile
    import torchaudio
    
    with tarfile.TarFile(archive_path, 'r') as tarobj:
        data, sample_rate = torchaudio.load(tarobj.extractfile(member_name))
  • Streaming over the network
    With requests
    import requests
    import torchaudio
    
    with requests.get(url, stream=True) as resp:
        data, sample_rate = torchaudio.load(resp.raw)
    With boto3
    import boto3
    
    s3 = boto3.client('s3')
    obj = s3.get_object(Bucket=bucket, Key=key)
    data, sample_rate = torchaudio.load(obj['Body'])

For sox_io backend, the new loading function load_audio_fileobj, which works on any Python object with read method that returns bytes. It is added and bound via PyBind11.

For soundfile backend, pysoundfile natively supports file-like object

closes #754 #800
See als #1115

@mthrok mthrok force-pushed the file-like-obj-load branch from 73db470 to fc9b189 Compare January 7, 2021 20:04
@rbracco
Copy link

rbracco commented Jan 22, 2021

Thank you, this is an awesome and much needed feature. Your work is appreciated!

@mthrok
Copy link
Contributor Author

mthrok commented Jan 22, 2021

Thank you, this is an awesome and much needed feature. Your work is appreciated!

Hi @rbracco

Thanks :) . Please try it out and give feedback.
Please consider this as prototype feature.
I am observing sporadic test failure in our CI.
I am trying to figure it out what are the conditions that are causing the failure.

https://app.circleci.com/pipelines/github/pytorch/audio/4658/workflows/c810196e-4e45-4d27-8089-9844eb19adf6/jobs/148952

@rbracco
Copy link

rbracco commented Jan 28, 2021

I have tried it out by sending audio from a React-based frontend to a fastapi backend that receives a file-like object as an upload and then reads it into torchaudio. It worked great with wav files, but when I tried to simplify by moving to mp3 (smaller, more efficient) I started getting RuntimeError: Error loading audio file: failed to open file. in load.

All of the following work:
-Loading a normal MP3 in torchaudio (on the backend server)
-Downloading the MP3 generated by my frontend and opening/playing in an audio player.
-Sending the blob to backend server, writing it to a .mp3 file, and then loading it in torchaudio as a normal file (code below)

For this reason I think it is likely an issue with the new file-like object loading, but unfortunately I have very shallow knowledge of audio formats, C++, and how this works under the hood. I'm happy to keep helping with testing though, please let me know if you have any ideas for a fix, or if this is worthy of a separate issue. Thanks again, I'm really excited for this feature and I'm in an audio ML chat on telegram with many others who feel the same.

with open("audios/example.mp3", "wb") as f:
        f.write(file.file.read())
    y, sr = torchaudio.load("audios/example.mp3")

@mthrok
Copy link
Contributor Author

mthrok commented Jan 28, 2021

Hi @rbracco

Thanks for trying out and the feedback. I am happy to hear that this addition is helpful for you.

For your issue,

If you still see an issue, please open an issue. I am more than happy to look at it.
I also want to polish this feature but I am seeing test failure that happens sporadically (like this) and trying to figure out the cause.

mthrok added a commit to mthrok/audio that referenced this pull request Feb 23, 2021
Ever since the file-like object support was added in pytorch#1158, the test
was occasionally failing in CI. This PR fixes this.
mthrok added a commit to mthrok/audio that referenced this pull request Feb 23, 2021
Ever since the file-like object support was added in pytorch#1158, the test
was occasionally failing in CI. This PR fixes this.
mthrok added a commit that referenced this pull request Feb 23, 2021
* Fix fileobj I/O undeterministic behavior

Ever since the file-like object support was added in #1158, the test
was occasionally failing in CI. This PR fixes this.
mthrok added a commit to mthrok/audio that referenced this pull request Feb 23, 2021
* Fix fileobj I/O undeterministic behavior

Ever since the file-like object support was added in pytorch#1158, the test
was occasionally failing in CI. This PR fixes this.
mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021
* Create distributed_rpc_profiling.rst

* Update recipes_index.rst

* Add files via upload

* Update recipes_index.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for file-like object

4 participants