-
Couldn't load subscription status.
- Fork 734
Closed
Description
There are many exciting work elements that are planned for torchaudio.
- Provide support for large scale training.
- Support a large-scale training reference task using wav2vec on librivox, and offer a pre-trained version of the model.
- Support the emergence of audio specific transformer models by exploring abstractions would be beneficial to provide.
- Extend support for speech recognition.
- Investigate the addition of beam search, and a 4-gram language model, see here and here, to reduce the word error rate in the existing pipeline.
- ✅ Support in-memory codec encoding and decoding, see here, to support codec based data augmentation.
- ✅ Add the Kaldi pitch feature, see here, that is used in the audio community.
- Implement a prototype of WFST-based ASR model, using GTN or K2, see here.
- Add RNN transducer loss, see here and follow-up, to train RNN transducer models efficiently.
- Provide high-performance data loading and media decoding experience.
- Improve our codebase
- ✅ Create libtorchaudio by building the C++ extension outside of Python, see here.
The goal of torchaudio is to accelerate research through novel, production-ready building blocks. As such, we would love to hear feedback on the plan, so make sure to reach out to us, @mthrok and @vincentqb!
cc internal
krishnakalyan3 and harishsdev
Metadata
Metadata
Assignees
Labels
No labels