Skip to content

Add spoken digit dataset and baseline model #1090

@faroit

Description

@faroit

🚀 Feature

Given that there is a lack of small and comprehensive audio tasks, I would propose to add a speech MNIST dataset to torch audio.

Motivation

In the audio domain, we often lack small toy scenarios that would be a good equivalent to the ubiqous MNIST task.
A spoken digit dataset and model could help to sketch and try audio ML ideas.

Pitch

add either of the two:

to torchaudio.datasets

Additional context

furthermore, it might be a good idea to also add a baseline model, either based on MELSpectrogram -> conv2d or using the existing wav2letter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions