🚀 Feature
Given that there is a lack of small and comprehensive audio tasks, I would propose to add a speech MNIST dataset to torch audio.
Motivation
In the audio domain, we often lack small toy scenarios that would be a good equivalent to the ubiqous MNIST task.
A spoken digit dataset and model could help to sketch and try audio ML ideas.
Pitch
add either of the two:
to torchaudio.datasets
Additional context
furthermore, it might be a good idea to also add a baseline model, either based on MELSpectrogram -> conv2d or using the existing wav2letter.