Add cmu_arctic dataset #710

jimchen90 · 2020-06-09T15:09:42Z

Add the CMU_ARCTIC dataset used here.

Related to #550
Fixes #512

vincentqb · 2020-06-09T21:20:32Z

torchaudio/datasets/cmu_arctic.py

+        for line in ft:
+            file_id, utterance = line.strip().split(" ", 2)[1:]
+            if fileid == file_id:
+                utterance = utterance[1:-3]


Can you add a quick comment to document this line?

# clean "(utterance )" to "utterance"

I will add the comment. Thanks.

torchaudio/datasets/cmu_arctic.py

vincentqb · 2020-06-09T21:29:28Z

torchaudio/datasets/cmu_arctic.py

+
+class CMU_ARCTIC(Dataset):
+    """
+    Create a Dataset for CMU_Arctic. Each item is a tuple of the form:


nit: no need for underscore

I will change it. Thanks.

codecov · 2020-06-10T13:58:43Z

Codecov Report

Merging #710 into master will decrease coverage by 0.06%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master     #710      +/-   ##
==========================================
- Coverage   88.78%   88.72%   -0.07%     
==========================================
  Files          22       23       +1     
  Lines        2355     2404      +49     
==========================================
+ Hits         2091     2133      +42     
- Misses        264      271       +7

Impacted Files	Coverage Δ
torchaudio/datasets/cmuarctic.py	`85.41% <85.41%> (ø)`
torchaudio/datasets/__init__.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a466b3c...9e633dc. Read the comment docs.

jimchen90 · 2020-06-10T13:59:43Z

Updates:

Change cmu_arctic to cmuarctic, CMU_ARCTIC to CMUARCTIC.
Update the text read method by following [commonvoice].(https://github.com/pytorch/audio/blob/master/torchaudio/datasets/commonvoice.py#L189).
Change cmu_us_jmk_arctic to jmk and so on.
Add comment # clean "utterance" ) to utterance.

torchaudio/datasets/cmuarctic.py

jimchen90 · 2020-06-10T17:59:40Z

Updates:

Change the comment of utterance to #remove space, double quote, and single parenthesis from utterance.
Move [0] inside load_cmuarctic_item

docs/source/datasets.rst

vincentqb

LGTM! Let's wait on the tests to be done, and then please feel free to merge it :)

Ji Chen added 4 commits June 9, 2020 06:34

Add cmu_arctic dataset

77246c6

add dataset name

84c9c8d

update audio test file with whitenoise.wav file

5eae5e2

add test text file

d54fbf6

jimchen90 added the module: datasets label Jun 9, 2020

jimchen90 requested a review from vincentqb June 9, 2020 21:20

vincentqb suggested changes Jun 9, 2020

View reviewed changes

update text method and file name

d64e5eb

vincentqb reviewed Jun 10, 2020

View reviewed changes

torchaudio/datasets/cmuarctic.py Outdated Show resolved Hide resolved

vincentqb reviewed Jun 10, 2020

View reviewed changes

torchaudio/datasets/cmuarctic.py Outdated Show resolved Hide resolved

update comment

54b6019

vincentqb reviewed Jun 10, 2020

View reviewed changes

docs/source/datasets.rst Outdated Show resolved Hide resolved

change datasets order in doc

7a541bb

vincentqb approved these changes Jun 10, 2020

View reviewed changes

add line length

9e633dc

jimchen90 merged commit 55b5c80 into pytorch:master Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cmu_arctic dataset #710

Add cmu_arctic dataset #710

Uh oh!

jimchen90 commented Jun 9, 2020 •

edited

Loading

Uh oh!

vincentqb Jun 9, 2020

Uh oh!

jimchen90 Jun 9, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

vincentqb Jun 9, 2020

Uh oh!

jimchen90 Jun 9, 2020

Uh oh!

codecov bot commented Jun 10, 2020 •

edited

Loading

Uh oh!

jimchen90 commented Jun 10, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jimchen90 commented Jun 10, 2020

Uh oh!

Uh oh!

vincentqb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add cmu_arctic dataset #710

Add cmu_arctic dataset #710

Uh oh!

Conversation

jimchen90 commented Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

jimchen90 Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vincentqb Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

jimchen90 Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jimchen90 commented Jun 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jimchen90 commented Jun 10, 2020

Uh oh!

Uh oh!

vincentqb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jimchen90 commented Jun 9, 2020 •

edited

Loading

jimchen90 Jun 9, 2020 •

edited

Loading

codecov bot commented Jun 10, 2020 •

edited

Loading

jimchen90 commented Jun 10, 2020 •

edited

Loading