-
Notifications
You must be signed in to change notification settings - Fork 739
Description
torchaudio had minimalistic test for dataset implementations. See here
Recently we have improved our test utilities and now we can generate synthetic data which emulates a subset of dataset. See examples YesNo and GTZAN
We would like to do the same for the remaining datasets
- VCTK
- LibriSpeech
- LJSpeech
- SpeechCommands
- CMUArctic
- CommonVoice
General Direction
-
Check the dataset of interest and pick subset of files, (check their naming conventions, sampling rate and number of channels)
-
Follow the approach of existing test module, create a new test module
test/datasets/XXX_test.pyand define your test class. -
Generate pseudo dataset in
setUpClassmethod. Create a list of expected data. -
Traverse the directory with Dataset implementation
-
Check that files are traversed in the expected order, then loaded data match.
-
Check that Dataset traversed the expected number of files.
-
If the dataset has multiple operational modes, like subset in GTZAN also add these as test methods.
-
Once the new test is added, remove the original test and
the associated assets.
test/assets/ARCTIC/cmu_us_aew_arctic/etc/txt.done.data test/assets/ARCTIC/cmu_us_aew_arctic/wav/arctic_a0024.wav test/assets/CommonVoice/cv-corpus-4-2019-12-10/tt/clips/common_voice_tt_00000000.wav test/assets/CommonVoice/cv-corpus-4-2019-12-10/tt/train.tsv test/assets/LJSpeech-1.1/metadata.csv test/assets/LJSpeech-1.1/wavs/LJ001-0001.wav test/assets/LibriSpeech/dev-clean/1272/128104/1272-128104-0000.flac test/assets/LibriSpeech/dev-clean/1272/128104/1272-128104.trans.txt test/assets/SpeechCommands/speech_commands_v0.02/go/0a9f9af7_nohash_0.wav test/assets/VCTK-Corpus/txt/p224/p224_002.txt test/assets/VCTK-Corpus/wav48/p224/p224_002.wav -
Once the PR is ready add @mthrok as reviewer.
Note
- It is highly recommended to use Anaconda
- Please use nightly build of PyTorch. https://pytorch.org/
- You can run test with
pytest test/datasets/XXX_test.py. - PR example Make GTZAN dataset sorted and use on-the-fly data in GTZAN test #819
- For the simplicity, please use
wavformat when saving synthetic data (save_wav) even if the reference dataset uses other format. (decoding formats likemp3adds complexity to test logic, which we are trying to avoid) - When saving wave data with
save_wav, thedtypeof the Tensor makes difference. If the reference dataset uses WAV format, use the same bit depth (likeint16). If the reference dataset uses compressed format, likemp3orflac, usefloat32wav. - Data loaded with Dataset implementation typical has normalized (values in
[-1.0, 1.0]),float32type. (which is whynormalize_wavis used to generate reference data in the examples above.)