-
Notifications
You must be signed in to change notification settings - Fork 814
Steps to retire legacy code and release new building blocks in torchtext #985
Description
A new abstraction has been described in 0.5.0 release note. Currently, we are working on retiring a few legacy codes in torchtext in the next releases. This issue will track the progress of the relevant work. Here are a few steps that users could expect:
Step 1: Retire legacy codes in torchtext.data and torchtext.datasets
The following components will be retired from source code soon. We have added a few deprecation warning messages in 0.7.0 release (link). Users can still find them in torchtext.legacy and the original constructors will raise error when calling them.
torchtext.data.field- RawField, Field, ReversibleField, SubwordField, NestedField, LabelFieldtorchtext.data.iterator- BucketIterator, Iterator, BPTTIteratortorcthtext.data.dataset- Dataset, TabularDatasettorchtext.data.example- Exampletorchtext.data.pipeline- Pipelinetorchtext.data.batch- Batch
At the same time, the datasets in torchtext.datasets are based on the legacy code above so they will be moved to the legacy folder:
- language_modeling -
LanguageModelingDataset,WikiText2,WikiText103,PennTreebank - nli -
SNLI,MultiNLI,XNLI - sst -
SST - translation -
TranslationDataset,Multi30k,IWSLT,WMT14 - sequence_tagging -
SequenceTaggingDataset,UDPOS,CoNLL2000Chunking - trec -
TREC - imdb -
IMDB - babi -
BABI20
Step 2: Release the new datasets
A few legacy datasets above have been re-written and are currently available in torchtext.experimental.datasets. They will be released to the core library:
- language_modeling -
LanguageModelingDataset,WikiText2,WikiText103,PennTreebank,WMTNewsCrawl - text_classification -
AG_NEWS,SogouNews,DBpedia,YelpReviewPolarity,YelpReviewFull,YahooAnswers,AmazonReviewPolarity,AmazonReviewFull,IMDB - sequence_tagging -
UDPOS,CoNLL2000Chunking - translation -
Multi30k,IWSLT,WMT14 - question_answer -
SQuAD1,SQuAD2
Step 3: Retire legacy vocab/vector and release the new data processing building blocks
We also re-written the vocabulary and word vectors as high performance building blocks with the JIT support. We will retire the following components
torchtext.vocab.Vocabtorchtext.vocab.Vectorsalong withGloVe,FastText,CharNGram.
After this, the new vocabulary and vector building blocks in the experimental folder will be moved to the core library.
torchtext.experimental.vectorstorchtext.experimental.vocab
We also have some transforms that will be released to the core library.
torchtext.experimental.transforms
In general, we understand this is the special time for the torchtext library because we have to handle the legacy code and new building blocks at the same time. We really appreciate the efforts from the OSS community. Users should use the code in the three categories with the following expectations:
legacyfolder - we will accept bug fix but not new featurestorchtextmain folder - we officially support via the stable release and carefully handle BC breaking.experimentalfolder - experimental components available via nightly release channel. Users might experience BC breaking without warning messages.