Customize torchtext.data.Dataset takes much time to generate dataset 

## ❓ Questions and Help

**Description**
I wrote a customized data.Dataset for multilabel classification. When I processed the data, I found that it is very slow to generate train and test using the customized dataset (it takes about 1.5s per example). I am wondering is it normal or it's something wrong with my customized dataset. 

Customized data.Dataset for mulilabel classification is as follows:
```
class TextMultiLabelDataset(data.Dataset):
    def __init__(self, text, text_field, label_field, lbls=None, **kwargs):
        # torchtext Field objects
        fields = [('text', text_field), ('label', label_field)]
        # for l in lbl_cols:
        # fields.append((l, label_field))

        is_test = True if lbls is None else False
        if is_test:
            pass
        else:
            n_labels = len(lbls)

        examples = []
        for i, txt in enumerate(tqdm(text)):
            if not is_test:
                l = lbls[i]
            else:
                l = [0.0] * n_labels

            examples.append(data.Example.fromlist([txt, l], fields))

        super(TextMultiLabelDataset, self).__init__(examples, fields, **kwargs)
```
```
where text is a list of list strings that in the documents, and lbls is a list of list labels in binary. (Total number of labels ~ 20000)
```
examples of text:
```
[["There are few factors more important to the mechanisms of evolution than stress. The stress response has formed as a result of natural selection..."], ["A 46-year-old female patient presenting with unspecific lower back pain, diffuse abdominal pain, and slightly elevated body temperature"], ...]
```
examples of lbls:
```
[[1 1 1 1 0 0 0 1 0 ...], [1 0 1 0 1 1 1 1 ...], ...]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Customize torchtext.data.Dataset takes much time to generate dataset #858

❓ Questions and Help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Customize torchtext.data.Dataset takes much time to generate dataset #858

Description

❓ Questions and Help

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions