Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit b64fbfa

Browse files
author
nayef211
committed
Added datasets that roberta was trained on
1 parent 3303894 commit b64fbfa

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

torchtext/models/roberta/bundler.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,10 @@ def encoderConf(self) -> RobertaEncoderConf:
220220
training on longer sequences; and dynamically changing the masking pattern applied
221221
to the training data.
222222
223+
The RoBERTa model was pretrained on the reunion of five datasets: BookCorpus,
224+
English Wikipedia, CC-News, OpenWebText, and STORIES. Together theses datasets
225+
contain over a 160GB of text.
226+
223227
Originally published by the authors of RoBERTa under MIT License
224228
and redistributed with the same license.
225229
[`License <https://github.com/pytorch/fairseq/blob/main/LICENSE>`__,
@@ -262,6 +266,11 @@ def encoderConf(self) -> RobertaEncoderConf:
262266
training on longer sequences; and dynamically changing the masking pattern applied
263267
to the training data.
264268
269+
The RoBERTa model was pretrained on the reunion of five datasets: BookCorpus,
270+
English Wikipedia, CC-News, OpenWebText, and STORIES. Together theses datasets
271+
contain over a 160GB of text.
272+
273+
265274
Originally published by the authors of RoBERTa under MIT License
266275
and redistributed with the same license.
267276
[`License <https://github.com/pytorch/fairseq/blob/main/LICENSE>`__,

0 commit comments

Comments
 (0)