From a269cce7fa88fa155205b40734567144a9772b49 Mon Sep 17 00:00:00 2001 From: Artyom Astafurov Date: Wed, 1 Jul 2020 16:41:08 -0400 Subject: [PATCH 1/6] add Waveforms for Testing Purposes section --- test/README.md | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/test/README.md b/test/README.md index 35a926c120..1e333fef41 100644 --- a/test/README.md +++ b/test/README.md @@ -41,6 +41,74 @@ The following test modules are defined for corresponding `torchaudio` module/fun - [assets/kaldi](./assets/kaldi): Contains Kaldi format matrix files used in [./test_compliance_kaldi.py](./test_compliance_kaldi.py). - [compliance](./compliance): Scripts used to generate above Kaldi matrix files. +### Waveforms for Testing Purposes + +When testing transforms we often need waveforms of specific type (ex: pure tone, noise, or voice), with specific bitrate (ex. 8 or 16 kHz) and number of channels (ex. mono, stereo). Below are some tips on how to construct waveforms and guidance around existing audio files. + +#### Load a Waveform from a File + +```python +filepath = common_utils.get_asset_path('filename.wav') +waveform, sample_rate = scipy.io.wavfile.read(filepath) +``` + +*Note: Should you choose to contribute an audio file, please leave a comment in the issue or pull request, mentioning content source and licensing information. WAV files are preferred. Other formats should be used only when there is no alternative. (i.e. dataset implementation comes with hardcoded non-wav extension).* + +#### Pure Tone + +Code: + +```python + waveform = common_utils.get_sinusoid( + frequency=300, + sample_rate=16000, + duration=1, # seconds + n_channels=1, + dtype="float32", + device="cpu", +) +``` + +Files: + +* `sinewave.wav` +* `100Hz_44100Hz_16bit_05sec.wav` +* `440Hz_44100Hz_16bit_05sec.wav` + +#### Noise + +Code: + +```python +tensor = common_utils.get_whitenoise() +``` + +Files: + +* `whitenoise.wav` +* `whitenoise.mp3` +* `whitenoise_1min.mp3` +* `steam-train-whistle-daniel_simon.wav` +* `steam-train-whistle-daniel_simon.mp3` + +#### Voice + +Files: + +* `CommonVoice/cv-corpus-4-2019-12-10/tt/clips/common_voice_tt_00000000.mp3` +* `LibriSpeech/dev-clean/1272/128104/1272-128104-0000.flac` +* `LJSpeech-1.1/wavs/LJ001-0001.wav` +* `SpeechCommands/speech_commands_v0.02/go/0a9f9af7_nohash_0.wav` +* `VCTK-Corpus/wav48/p224/p224_002.wav` +* `waves_yesno/0_1_0_1_0_1_1_0.wav` +* `vad-go-stereo-44100.wav` +* `vad-go-mono-32000.wav` + +#### Other + +* `kaldi_file.wav` +* `test.wav` +* `kaldi_file_8000.wav` ## Adding test From 8891ac47ddbcbde1a3ebb2ec88ec24e0f7d2bea2 Mon Sep 17 00:00:00 2001 From: Artyom Astafurov Date: Wed, 1 Jul 2020 17:44:18 -0400 Subject: [PATCH 2/6] Update test/README.md use wrapper function for scipy.io.wavfile.read Co-authored-by: moto <855818+mthrok@users.noreply.github.com> --- test/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/README.md b/test/README.md index 1e333fef41..8b39371cd9 100644 --- a/test/README.md +++ b/test/README.md @@ -49,7 +49,7 @@ When testing transforms we often need waveforms of specific type (ex: pure tone, ```python filepath = common_utils.get_asset_path('filename.wav') -waveform, sample_rate = scipy.io.wavfile.read(filepath) +waveform, sample_rate = common_utils.load_wav(path) ``` *Note: Should you choose to contribute an audio file, please leave a comment in the issue or pull request, mentioning content source and licensing information. WAV files are preferred. Other formats should be used only when there is no alternative. (i.e. dataset implementation comes with hardcoded non-wav extension).* From 9afbed3a22c28ac4bc7ef68d6b0ec6359a6f620a Mon Sep 17 00:00:00 2001 From: Artyom Astafurov Date: Wed, 1 Jul 2020 17:51:01 -0400 Subject: [PATCH 3/6] remove un-used files from the doc --- test/README.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/test/README.md b/test/README.md index 8b39371cd9..7cd7136f76 100644 --- a/test/README.md +++ b/test/README.md @@ -69,12 +69,6 @@ Code: ) ``` -Files: - -* `sinewave.wav` -* `100Hz_44100Hz_16bit_05sec.wav` -* `440Hz_44100Hz_16bit_05sec.wav` - #### Noise Code: @@ -89,7 +83,6 @@ Files: * `whitenoise.mp3` * `whitenoise_1min.mp3` * `steam-train-whistle-daniel_simon.wav` -* `steam-train-whistle-daniel_simon.mp3` #### Voice @@ -107,7 +100,6 @@ Files: #### Other * `kaldi_file.wav` -* `test.wav` * `kaldi_file_8000.wav` ## Adding test From d61a97b93d880b72eac3dc69f2047b75121d3a01 Mon Sep 17 00:00:00 2001 From: Artyom Astafurov Date: Mon, 6 Jul 2020 11:48:07 -0400 Subject: [PATCH 4/6] Update test/README.md Rename variable Co-authored-by: moto <855818+mthrok@users.noreply.github.com> --- test/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/README.md b/test/README.md index 7cd7136f76..ba887b4bdf 100644 --- a/test/README.md +++ b/test/README.md @@ -49,7 +49,7 @@ When testing transforms we often need waveforms of specific type (ex: pure tone, ```python filepath = common_utils.get_asset_path('filename.wav') -waveform, sample_rate = common_utils.load_wav(path) +waveform, sample_rate = common_utils.load_wav(filepath) ``` *Note: Should you choose to contribute an audio file, please leave a comment in the issue or pull request, mentioning content source and licensing information. WAV files are preferred. Other formats should be used only when there is no alternative. (i.e. dataset implementation comes with hardcoded non-wav extension).* From 1b088aaf03742ea2e2ce66963fd6e7dfeb279c0a Mon Sep 17 00:00:00 2001 From: Artyom Astafurov Date: Mon, 6 Jul 2020 11:51:09 -0400 Subject: [PATCH 5/6] fix indent; remove mentions of unused files --- test/README.md | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/test/README.md b/test/README.md index ba887b4bdf..00b8bc249e 100644 --- a/test/README.md +++ b/test/README.md @@ -59,13 +59,13 @@ waveform, sample_rate = common_utils.load_wav(filepath) Code: ```python - waveform = common_utils.get_sinusoid( - frequency=300, - sample_rate=16000, - duration=1, # seconds - n_channels=1, - dtype="float32", - device="cpu", +waveform = common_utils.get_sinusoid( + frequency=300, + sample_rate=16000, + duration=1, # seconds + n_channels=1, + dtype="float32", + device="cpu", ) ``` @@ -97,11 +97,6 @@ Files: * `vad-go-stereo-44100.wav` * `vad-go-mono-32000.wav` -#### Other - -* `kaldi_file.wav` -* `kaldi_file_8000.wav` - ## Adding test The following is the current practice of torchaudio test suite. From d09fdc8df78ddb7c8b71530f119a701ace074e47 Mon Sep 17 00:00:00 2001 From: Artyom Astafurov Date: Tue, 7 Jul 2020 10:23:49 -0400 Subject: [PATCH 6/6] remove whitenoise* files from README.md --- test/README.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/test/README.md b/test/README.md index 00b8bc249e..7ac13c4169 100644 --- a/test/README.md +++ b/test/README.md @@ -79,9 +79,6 @@ tensor = common_utils.get_whitenoise() Files: -* `whitenoise.wav` -* `whitenoise.mp3` -* `whitenoise_1min.mp3` * `steam-train-whistle-daniel_simon.wav` #### Voice