Skip to content

Conversation

@jimchen90
Copy link
Contributor

@jimchen90 jimchen90 commented Jun 9, 2020

Add the CMU_ARCTIC dataset used here.

Related to #550
Fixes #512

for line in ft:
file_id, utterance = line.strip().split(" ", 2)[1:]
if fileid == file_id:
utterance = utterance[1:-3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a quick comment to document this line?

# clean "(utterance )" to "utterance"

Copy link
Contributor Author

@jimchen90 jimchen90 Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the comment. Thanks.


class CMU_ARCTIC(Dataset):
"""
Create a Dataset for CMU_Arctic. Each item is a tuple of the form:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for underscore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it. Thanks.

@codecov
Copy link

codecov bot commented Jun 10, 2020

Codecov Report

Merging #710 into master will decrease coverage by 0.06%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #710      +/-   ##
==========================================
- Coverage   88.78%   88.72%   -0.07%     
==========================================
  Files          22       23       +1     
  Lines        2355     2404      +49     
==========================================
+ Hits         2091     2133      +42     
- Misses        264      271       +7     
Impacted Files Coverage Δ
torchaudio/datasets/cmuarctic.py 85.41% <85.41%> (ø)
torchaudio/datasets/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a466b3c...9e633dc. Read the comment docs.

@jimchen90
Copy link
Contributor Author

jimchen90 commented Jun 10, 2020

Updates:

  1. Change cmu_arctic to cmuarctic, CMU_ARCTIC to CMUARCTIC.
  2. Update the text read method by following [commonvoice].(https://github.com/pytorch/audio/blob/master/torchaudio/datasets/commonvoice.py#L189).
  3. Change cmu_us_jmk_arctic to jmk and so on.
  4. Add comment # clean "utterance" ) to utterance.

@jimchen90
Copy link
Contributor Author

Updates:

  1. Change the comment of utterance to #remove space, double quote, and single parenthesis from utterance.
  2. Move [0] inside load_cmuarctic_item

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's wait on the tests to be done, and then please feel free to merge it :)

@jimchen90 jimchen90 merged commit 55b5c80 into pytorch:master Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CMU_ARCTIC dataset

2 participants