Problem with StopIteration on dataset when creating vocabulary

## ❓ Questions and Help

**Description**

Hey folks, I was hoping someone could tell me a better way to deal with this issue. I am getting a `StopIteration` error on the 
dataset, and I am not clear on how to get around it. Here is a minimal example below which creates the error. I am using `Torchtext` 0.10.0. 

 In the real code, I am pulling the AG_NEWS dataset into the `train_iter` variable, building a vocabulary based on that `train_iter` dataset, and then trying to process batches for that same dataset using a Dataloader with collate function. 

The problem seems to be that I iterate through `train_iter` one time, in order to build the vocabulary with the `yield_tokens` function. But when I try and then do `next(iter(train_iter))`, the iterator has already reached its end. Is there a way to copy the `train_iter` so that I can build the vocabulary based on the copy. I can probably write some hacky code to workaround this, but just wanted to see if there is a better or more appropriate way.

```
from torchtext.datasets import AG_NEWS
from torchtext.data.utils import get_tokenizer

from typing import Optional, Tuple

import torchtext
import torch
from torchtext.vocab import Vocab, build_vocab_from_iterator
import numpy as np


def yield_tokens(data_iter):
    for _, text in data_iter:
        yield tokenizer(text)

tokenizer = get_tokenizer('basic_english')
train_iter, test_iter = AG_NEWS()

vocab = build_vocab_from_iterator(yield_tokens(train_iter), 
                                            specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])

print(next(iter(train_iter)))
```

The error message generated is:
```
Exception has occurred: StopIteration
exception: no description

    print(next(iter(train_iter)))
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem with StopIteration on dataset when creating vocabulary #1447

❓ Questions and Help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem with StopIteration on dataset when creating vocabulary #1447

Description

❓ Questions and Help

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions