This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Description
So I'm getting a strange issue, where I'm trying to read in a dataset (from a single file), split it into a train, dev, and test set. If I read it in using TabularDataset, then split the data, and train the vocab on the first split, I get KeyErrors, however if I split the dataset files prior to reading it in, no such errors occur.
Dataset I've been running into this issue: https://github.com/t-davidson/hate-speech-and-offensive-language/tree/master/data
To Reproduce
- Read in data
- Split the data (using tabulardataset.split) into n sets
- Build your vocab on the training set
- Iterate over dev/test set
- PyTorch Version (e.g., 1.0): 1.20
- OS (e.g., Linux): OSX
- How you installed PyTorch (
conda, pip, source): pip
- Python version: 3.7