@@ -155,7 +155,9 @@ def _get_labels():
155155)
156156WAV2VEC2_BASE .__doc__ = """wav2vec 2.0 model with "Base" configuration.
157157
158- Trained on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset. Not fine-tuned.
158+ Pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`]
159+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500").
160+ Not fine-tuned.
159161
160162Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2vec`].
161163[`Source <https://github.com/pytorch/fairseq/tree/main/examples/wav2vec#pre-trained-models>`__]
@@ -193,8 +195,10 @@ def _get_labels():
193195)
194196WAV2VEC2_ASR_BASE_10M .__doc__ = """Build "base" wav2vec2 model with an extra linear module
195197
196- Pre-trained on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset, and
197- fine-tuned for ASR on 10 minutes of *Libri-Light* [:footcite:`librilight`] dataset.
198+ Pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`]
199+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and
200+ fine-tuned for ASR on 10 minutes of transcribed audio from *Libri-Light* dataset
201+ [:footcite:`librilight`] ("train-10min" subset).
198202
199203Originally published by the authors of *wav2vec 2.0*
200204[:footcite:`baevski2020wav2vec`].
@@ -234,9 +238,10 @@ def _get_labels():
234238
235239WAV2VEC2_ASR_BASE_100H .__doc__ = """Build "base" wav2vec2 model with an extra linear module
236240
237- Pre-trained on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset, and
238- fine-tuned for ASR on 100 hours of *LibriSpeech* [:footcite:`librilight`] dataset
239- (test-clean-100 subset).
241+ Pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`]
242+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and
243+ fine-tuned for ASR on 100 hours of transcribed audio from the same dataset
244+ ("train-clean-100" subset).
240245
241246Originally published by the authors of *wav2vec 2.0*
242247[:footcite:`baevski2020wav2vec`].
@@ -275,8 +280,9 @@ def _get_labels():
275280)
276281WAV2VEC2_ASR_BASE_960H .__doc__ = """Build "base" wav2vec2 model with an extra linear module
277282
278- Pre-trained and fine-tuned for ASR on 960 hours of
279- *LibriSpeech* [:footcite:`7178964`] dataset.
283+ Pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`]
284+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and
285+ fine-tuned for ASR on the same audio with the corresponding transcripts.
280286
281287Originally published by the authors of *wav2vec 2.0*
282288[:footcite:`baevski2020wav2vec`].
@@ -315,7 +321,9 @@ def _get_labels():
315321)
316322WAV2VEC2_LARGE .__doc__ = """Build "large" wav2vec2 model.
317323
318- Trained on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset. Not fine-tuned.
324+ Pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`]
325+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500").
326+ Not fine-tuned.
319327
320328Originally published by the authors of *wav2vec 2.0*
321329[:footcite:`baevski2020wav2vec`].
@@ -354,8 +362,10 @@ def _get_labels():
354362)
355363WAV2VEC2_ASR_LARGE_10M .__doc__ = """Build "large" wav2vec2 model with an extra linear module
356364
357- Pre-trained on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset, and
358- fine-tuned for ASR on 10 minutes of *Libri-Light* [:footcite:`librilight`] dataset.
365+ Pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`]
366+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and
367+ fine-tuned for ASR on 10 minutes of transcribed audio from *Libri-Light* dataset
368+ [:footcite:`librilight`] ("train-10min" subset).
359369
360370Originally published by the authors of *wav2vec 2.0*
361371[:footcite:`baevski2020wav2vec`].
@@ -394,9 +404,10 @@ def _get_labels():
394404)
395405WAV2VEC2_ASR_LARGE_100H .__doc__ = """Build "large" wav2vec2 model with an extra linear module
396406
397- Pre-trained on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset, and
398- fine-tuned for ASR on 100 hours of *LibriSpeech* [:footcite:`librilight`] dataset
399- (test-clean-100 subset).
407+ Pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`]
408+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and
409+ fine-tuned for ASR on 100 hours of transcribed audio from
410+ the same dataset ("train-clean-100" subset).
400411
401412Originally published by the authors of *wav2vec 2.0*
402413[:footcite:`baevski2020wav2vec`].
@@ -435,8 +446,9 @@ def _get_labels():
435446)
436447WAV2VEC2_ASR_LARGE_960H .__doc__ = """Build "large" wav2vec2 model with an extra linear module
437448
438- Pre-trained and fine-tuned for ASR on 960 hours of
439- *LibriSpeech* [:footcite:`7178964`] dataset.
449+ Pre-trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`]
450+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500"), and
451+ fine-tuned for ASR on the same audio with the corresponding transcripts.
440452
441453Originally published by the authors of *wav2vec 2.0*
442454[:footcite:`baevski2020wav2vec`].
@@ -475,7 +487,9 @@ def _get_labels():
475487)
476488WAV2VEC2_LARGE_LV60K .__doc__ = """Build "large-lv60k" wav2vec2 model.
477489
478- Trained on 60,000 hours of *LibriLight* [:footcite:`librilight`] dataset. Not fine-tuned.
490+ Pre-trained on 60,000 hours of unlabeled audio from
491+ *Libri-Light* dataset [:footcite:`librilight`].
492+ Not fine-tuned.
479493
480494Originally published by the authors of *wav2vec 2.0*
481495[:footcite:`baevski2020wav2vec`].
@@ -514,8 +528,10 @@ def _get_labels():
514528)
515529WAV2VEC2_ASR_LARGE_LV60K_10M .__doc__ = """Build "large-lv60k" wav2vec2 model with an extra linear module
516530
517- Pre-trained on 60,000 hours of *Libri-Light* [:footcite:`librilight`] dataset, and
518- fine-tuned for ASR on 10 minutes of *Libri-Light* [:footcite:`librilight`] dataset.
531+ Pre-trained on 60,000 hours of unlabeled audio from
532+ *Libri-Light* dataset [:footcite:`librilight`], and
533+ fine-tuned for ASR on 10 minutes of transcribed audio from
534+ the same dataset dataset ("train-10min" subset).
519535
520536Originally published by the authors of *wav2vec 2.0*
521537[:footcite:`baevski2020wav2vec`].
@@ -554,9 +570,10 @@ def _get_labels():
554570)
555571WAV2VEC2_ASR_LARGE_LV60K_100H .__doc__ = """Build "large-lv60k" wav2vec2 model with an extra linear module
556572
557- Pre-trained on 60,000 hours of *Libri-Light* [:footcite:`librilight`] dataset, and
558- fine-tuned for ASR on 100 hours of *LibriSpeech* [:footcite:`librilight`] dataset
559- (test-clean-100 subset).
573+ Pre-trained on 60,000 hours of unlabeled audio from
574+ *Libri-Light* dataset [:footcite:`librilight`], and
575+ fine-tuned for ASR on 100 hours of transcribed audio from
576+ *LibriSpeech* dataset [:footcite:`7178964`] ("train-clean-100" subset).
560577
561578Originally published by the authors of *wav2vec 2.0*
562579[:footcite:`baevski2020wav2vec`].
@@ -595,8 +612,11 @@ def _get_labels():
595612)
596613WAV2VEC2_ASR_LARGE_LV60K_960H .__doc__ = """Build "large-lv60k" wav2vec2 model with an extra linear module
597614
598- Pre-trained on 60,000 hours of *Libri-Light* [:footcite:`librilight`] dataset, and
599- fine-tuned for ASR on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset.
615+ Pre-trained on 60,000 hours of unlabeled audio from *Libri-Light*
616+ [:footcite:`librilight`] dataset, and
617+ fine-tuned for ASR on 960 hours of transcribed audio from
618+ *LibriSpeech* dataset [:footcite:`7178964`]
619+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500").
600620
601621Originally published by the authors of *wav2vec 2.0*
602622[:footcite:`baevski2020wav2vec`].
@@ -638,7 +658,7 @@ def _get_labels():
638658Trained on 56,000 hours of multiple datasets (
639659*Multilingual LibriSpeech* [:footcite:`Pratap_2020`],
640660*CommonVoice* [:footcite:`ardila2020common`] and
641- *BABEL* [:footcite:`Gales2014SpeechRA`])
661+ *BABEL* [:footcite:`Gales2014SpeechRA`]) Not fine-tuned.
642662
643663Originally published by the authors of
644664*Unsupervised Cross-lingual Representation Learning for Speech Recognition*
@@ -678,7 +698,8 @@ def _get_labels():
678698)
679699HUBERT_BASE .__doc__ = """HuBERT model with "Base" configuration.
680700
681- Trained on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset. Not fine-tuned.
701+ Trained on 960 hours of unlabeled audio from *LibriSpeech* dataset [:footcite:`7178964`].
702+ Not fine-tuned.
682703
683704Originally published by the authors of *HuBERT* [:footcite:`hsu2021hubert`].
684705[`Source <https://github.com/pytorch/fairseq/tree/main/examples/hubert#pre-trained-and-fine-tuned-asr-models>`__]
@@ -716,8 +737,11 @@ def _get_labels():
716737)
717738HUBERT_ASR_LARGE .__doc__ = """HuBERT model with "Large" configuration.
718739
719- Pre-trained on 60,000 hours of *Libri-Light* [:footcite:`librilight`] dataset, and
720- fine-tuned for ASR on 960 hours of *LibriSpeech* [:footcite:`7178964`] dataset.
740+ Pre-trained on 60,000 hours of unlabeled audio from
741+ *Libri-Light* dataset [:footcite:`librilight`], and
742+ fine-tuned for ASR on 960 hours of transcribed audio from
743+ *LibriSpeech* dataset [:footcite:`7178964`]
744+ (the combination of "train-clean-100", "train-clean-360", and "train-other-500").
721745
722746Originally published by the authors of *HuBERT* [:footcite:`hsu2021hubert`].
723747[`Source <https://github.com/pytorch/fairseq/tree/main/examples/hubert#pre-trained-and-fine-tuned-asr-models>`__]
0 commit comments