[Cherry-picked 0.10] Add pretrained weights from wav2vec2.0 and XLSR papers #1827

mthrok · 2021-10-05T19:45:01Z

Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models

Wav2Vec 2.0 Base / Large / Large (LV-60)
XLSR-53

carolineechen · 2021-10-05T20:33:21Z

torchaudio/models/wav2vec2/pretrained.py

+)
+WAV2VEC2_ASR_BASE_10M.__doc__ = """Build "base" wav2vec2 model with an extra linear module
+
+Pre-trained on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset, and


Does this correspond to the Wav2Vec 2.0 Large | 10 minutes entry in the table? If so, should it be fine-tuned on LibriSpeech instead of Libri-Light?

Libri-Light is a subset of LibriSpeech, so both description is correct, but Libri-Light is more accurate.

Here is the description from the wav2vec 2.0 paper.

We fine-tune on five labeled data settings: 960 hours of transcribed Librispeech, the train-clean-100 subset comprising 100 hours (100 hours labeled), as well as the Libri-light limited resource training subsets originally extracted from Librispeech, these are train-10h (10 hours labeled), train-1h (1 hour labeled), train-10min (10 min labeled).

carolineechen · 2021-10-05T20:36:00Z

torchaudio/models/wav2vec2/pretrained.py

+
+WAV2VEC2_ASR_BASE_100H.__doc__ = """Build "base" wav2vec2 model with an extra linear module
+
+Pre-trained and fine-tuned for ASR on 960 hours of


I think this is switched with the WAV2VEC2_ASR_BASE_960H doc below

Good catch! Thank you!

torchaudio/models/wav2vec2/pretrained.py

carolineechen · 2021-10-05T20:39:38Z

torchaudio/models/wav2vec2/pretrained.py

+)
+WAV2VEC2_ASR_LARGE_LV60K_10M.__doc__ = """Build "large-lv60k" wav2vec2 model with an extra linear module
+
+Pre-trained on 60,000 hours of *Libri-Light* [:footcite:`librilight`] dataset, and


From this table and your WAV2VEC2_ASR_LARGE_LV60K_100H doc below, I think this should be fine-tuned on LibriSpeech instead of Libri-Light

Thanks for spotting the error. I looked at the paper again and it turned out that LibriVox is the correct one.

The following is the relationship between these datasets.

LibriVox: 60,000 hours audio

LibriSpeech: 960 hours audio + transcript, subset of LibriVox

LibriLight (Limited Resource Training Set): subset of LibriSpeech training subset

Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models - Wav2Vec 2.0 Base / Large / Large (LV-60) - XLSR-53

torchaudio/models/wav2vec2/pretrained.py

Co-authored-by: Caroline Chen <[email protected]>

Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models - Wav2Vec 2.0 Base / Large / Large (LV-60) - XLSR-53

pytorch-probot bot added the ciflow/default label Oct 5, 2021

mthrok requested review from carolineechen, hwangjeff and nateanl October 5, 2021 19:45

facebook-github-bot added the CLA Signed label Oct 5, 2021

carolineechen reviewed Oct 5, 2021

View reviewed changes

mthrok force-pushed the pretrain-3 branch from 6de7ae7 to 818d211 Compare October 6, 2021 01:01

Add pretrained weights from wav2vec2.0 and XLSR papers

02b3150

Add pretrained weights from https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models - Wav2Vec 2.0 Base / Large / Large (LV-60) - XLSR-53

mthrok force-pushed the pretrain-3 branch from 818d211 to 02b3150 Compare October 6, 2021 02:25

nateanl approved these changes Oct 6, 2021

View reviewed changes

tweak

23539b0

mthrok mentioned this pull request Oct 6, 2021

[Cherry-picked 0.10] Doc touch up #1828

Merged

carolineechen reviewed Oct 6, 2021

View reviewed changes

torchaudio/models/wav2vec2/pretrained.py Outdated Show resolved Hide resolved

mthrok requested a review from carolineechen October 6, 2021 14:47

carolineechen approved these changes Oct 6, 2021

View reviewed changes

Update torchaudio/models/wav2vec2/pretrained.py

c240b0c

Co-authored-by: Caroline Chen <[email protected]>

mthrok merged commit e40c9c3 into pytorch:main Oct 6, 2021

mthrok deleted the pretrain-3 branch October 6, 2021 14:59

mthrok changed the title ~~Add pretrained weights from wav2vec2.0 and XLSR papers~~ [Cherry-picked 0.10] Add pretrained weights from wav2vec2.0 and XLSR papers Oct 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Cherry-picked 0.10] Add pretrained weights from wav2vec2.0 and XLSR papers #1827

[Cherry-picked 0.10] Add pretrained weights from wav2vec2.0 and XLSR papers #1827

Uh oh!

mthrok commented Oct 5, 2021

Uh oh!

carolineechen Oct 5, 2021

Uh oh!

mthrok Oct 5, 2021

Uh oh!

carolineechen Oct 5, 2021

Uh oh!

mthrok Oct 5, 2021

Uh oh!

Uh oh!

carolineechen Oct 5, 2021

Uh oh!

mthrok Oct 5, 2021 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		WAV2VEC2_ASR_BASE_100H.__doc__ = """Build "base" wav2vec2 model with an extra linear module

		Pre-trained and fine-tuned for ASR on 960 hours of

[Cherry-picked 0.10] Add pretrained weights from wav2vec2.0 and XLSR papers #1827

[Cherry-picked 0.10] Add pretrained weights from wav2vec2.0 and XLSR papers #1827

Uh oh!

Conversation

mthrok commented Oct 5, 2021

Uh oh!

carolineechen Oct 5, 2021

Choose a reason for hiding this comment

Uh oh!

mthrok Oct 5, 2021

Choose a reason for hiding this comment

Uh oh!

carolineechen Oct 5, 2021

Choose a reason for hiding this comment

Uh oh!

mthrok Oct 5, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

carolineechen Oct 5, 2021

Choose a reason for hiding this comment

Uh oh!

mthrok Oct 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mthrok Oct 5, 2021 •

edited

Loading