Add spoken digit dataset and baseline model

## 🚀 Feature
Given that there is a lack of small and comprehensive audio tasks, I would propose to add a speech MNIST dataset to torch audio. 

## Motivation

In the audio domain, we often lack small toy scenarios that would be a good equivalent to the ubiqous MNIST task.
A spoken digit dataset and model could help to sketch and try audio ML ideas.

## Pitch

add either of the two:

* [FSDD](https://github.com/Jakobovski/free-spoken-digit-dataset)
* [AudioMNIST](https://github.com/soerenab/AudioMNIST) 

to `torchaudio.datasets`

## Additional context

furthermore, it might be a good idea to also add a baseline model, either based on MELSpectrogram -> conv2d or using the existing `wav2letter`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add spoken digit dataset and baseline model #1090

🚀 Feature

Motivation

Pitch

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add spoken digit dataset and baseline model #1090

Description

🚀 Feature

Motivation

Pitch

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions