This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Description
Hello,
I generated a text file called openbookQA_train. The contents of this file are shown below:
<sos> The sun is responsible for <mcoption> (A) puppies learning new tricks <eos>
<sos> The sun is responsible for <mcoption> (B) children growing up and getting old <eos>
<sos> The sun is responsible for <mcoption> (C) flowers wilting in a vase <eos>
<sos> The sun is responsible for <mcoption> (D) plants sprouting, blooming and wilting <eos>
I am trying to use or define torchtext Iterator to generate the input that I can pass into my Transformer.
I want each sample in my next(iter(openbookQA_train)).text to be a series of integers that are obtained by tokenizing each line of words between <sos> and <eos> (including those special tokens), and for a sample that contains lesser number of tokens than the bptt length, I want the sample to include all of the tokenized words between <sos> and <eos> and the rest of the slots to be filled with the token <pad> up to the bptt length.
How can I achieve this objective?
Thank you,