-
Notifications
You must be signed in to change notification settings - Fork 814
remove xlmr transform class and instead use sequential for model transforms composition #1482
Changes from all commits
6daa7bd
dd0fcca
028dffe
3afc363
4562de3
ab1d898
1713f8e
2f936c7
f3ae69b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,4 @@ | ||
| from dataclasses import dataclass | ||
| from functools import partial | ||
| from urllib.parse import urljoin | ||
|
|
||
| from typing import Optional, Callable, Dict, Union, Any | ||
|
|
@@ -15,7 +14,7 @@ | |
| RobertaModel, | ||
| ) | ||
|
|
||
| from .transforms import get_xlmr_transform | ||
| import torchtext.transforms as T | ||
|
|
||
| from torchtext import _TEXT_BUCKET | ||
|
|
||
|
|
@@ -156,10 +155,13 @@ def encoderConf(self) -> RobertaEncoderConf: | |
| XLMR_BASE_ENCODER = RobertaModelBundle( | ||
| _path=urljoin(_TEXT_BUCKET, "xlmr.base.encoder.pt"), | ||
| _encoder_conf=RobertaEncoderConf(vocab_size=250002), | ||
| transform=partial(get_xlmr_transform, | ||
| vocab_path=urljoin(_TEXT_BUCKET, "xlmr.vocab.pt"), | ||
| spm_model_path=urljoin(_TEXT_BUCKET, "xlmr.sentencepiece.bpe.model"), | ||
| ) | ||
| transform=lambda: T.Sequential( | ||
| T.SentencePieceTokenizer(urljoin(_TEXT_BUCKET, "xlmr.sentencepiece.bpe.model")), | ||
| T.VocabTransform(load_state_dict_from_url(urljoin(_TEXT_BUCKET, "xlmr.vocab.pt"))), | ||
| T.Truncate(510), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am curious as to why we are truncating to a length of 510 exactly? In the XLMR transform that is being removed, I noticed that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think 510 + the BOS and EOS is right. The 514 seems like an off-by-two.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for bringing this up. Right, I think 514 added earlier was not correct. The final max length should be 512 including the special tokens. Let me also confirm from the author of XLM-RoBERTa if this the right setting.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like some tests would help. |
||
| T.AddToken(token=0, begin=True), | ||
| T.AddToken(token=2, begin=False), | ||
|
Comment on lines
+162
to
+163
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it guaranteed that the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Certainly plausible that they'd add, say, additional special tokens. They would be appended to the vocab though, so 0 and 2 should remain as BOS and EOS respectively. If they didn't, you'd break the "contract" with how RoBERTa was trained.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are pre-trained models, and the transform comply with what the model was originally trained with. Certainly users can bring in their own flavors or Sentencepiece models, vocab etc, but then that transform is specific to user's model and won't apply to the pre-trained weights.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it thanks for the clarification! |
||
| ) | ||
| ) | ||
|
|
||
| XLMR_BASE_ENCODER.__doc__ = ( | ||
|
|
@@ -174,10 +176,13 @@ def encoderConf(self) -> RobertaEncoderConf: | |
| XLMR_LARGE_ENCODER = RobertaModelBundle( | ||
| _path=urljoin(_TEXT_BUCKET, "xlmr.large.encoder.pt"), | ||
| _encoder_conf=RobertaEncoderConf(vocab_size=250002, embedding_dim=1024, ffn_dimension=4096, num_attention_heads=16, num_encoder_layers=24), | ||
| transform=partial(get_xlmr_transform, | ||
| vocab_path=urljoin(_TEXT_BUCKET, "xlmr.vocab.pt"), | ||
| spm_model_path=urljoin(_TEXT_BUCKET, "xlmr.sentencepiece.bpe.model"), | ||
| ) | ||
| transform=lambda: T.Sequential( | ||
| T.SentencePieceTokenizer(urljoin(_TEXT_BUCKET, "xlmr.sentencepiece.bpe.model")), | ||
| T.VocabTransform(load_state_dict_from_url(urljoin(_TEXT_BUCKET, "xlmr.vocab.pt"))), | ||
| T.Truncate(510), | ||
| T.AddToken(token=0, begin=True), | ||
| T.AddToken(token=2, begin=False), | ||
| ) | ||
| ) | ||
|
|
||
| XLMR_LARGE_ENCODER.__doc__ = ( | ||
|
|
||
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning lambda is an anti-pattern.
https://docs.quantifiedcode.com/python-anti-patterns/correctness/assigning_a_lambda_to_a_variable.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, reading this, it might not
https://stackoverflow.com/a/27928036
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yet I feel overwhelmed by the fact that this lambda spans multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, interesting I wasn't aware of this around lambda usage. thanks for sharing. I honestly thought it looks nicer being able to define a lazy transform in-place :). The alternative would be to define a private function that returns the composite transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I am looking at all the possible reasoning, but I am also being convinced that
lambdais the simplest here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way would be to compose a no-op lambda with the rest of the transform with
partial, though it isn't as easy to read.