Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Change transforms in RoBERTa into classes #2002

@joecummings

Description

@joecummings

Currently, transforms in the RobertaBundle are defined as anonymous lambda functions. These are not pickleable and cannot be imported for use anywhere else.

Ex proposal:

lambda: T.Sequential(
        T.SentencePieceTokenizer(urljoin(_TEXT_BUCKET, "xlmr.sentencepiece.bpe.model")),
        T.VocabTransform(load_state_dict_from_url(urljoin(_TEXT_BUCKET, "xlmr.vocab.pt"))),
        T.Truncate(510),
        T.AddToken(token=0, begin=True),
        T.AddToken(token=2, begin=False),
    ),

-->

class RobertaTransform:
     def __init__(self, truncate_length=510):
         self.transform =  T.Sequential(
              T.SentencePieceTokenizer(urljoin(_TEXT_BUCKET, "xlmr.sentencepiece.bpe.model")),
              T.VocabTransform(load_state_dict_from_url(urljoin(_TEXT_BUCKET, "xlmr.vocab.pt"))),
              T.Truncate(truncate_length),
              T.AddToken(token=0, begin=True),
              T.AddToken(token=2, begin=False),
          ),

    def __call__(self, text):
        self.transform(text)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions