Skip to content

Which datasets should torchaudio have? #550

@vincentqb

Description

@vincentqb

Which new datasets should we offer and prioritize in torchaudio?

I want to follow-up on #31 and a few of the recent PRs. Instead of aiming to have an exhaustive list of datasets, we should focus on a few important/common/representative dataset that can serve as templates for users to easily implement datasets of their choosing. All datasets should already be free/accessible/online/common with license permitting linking to them.

torchaudio currently has:

  1. commonvoice
  2. librispeech
  3. ljspeech
  4. speechcommands
  5. vctk
  6. yesno

Current open proposals:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions