A faster and up to date implementation is in my other repo
Batched implementation of Hierarchical Attention Networks for Document Classification paper
- Pytorch (>= 0.2)
- Spacy (for tokenizing)
- Gensim (for building word vectors)
- tqdm (for fancy graphics)
prepare_data.pytransforms gzip files as found on Julian McAuley Amazon product data page to lists of(user,item,review,rating)tuples and builds word vectors if--create-emboption is specified.main.pytrains a Hierarchical Model.Data.pyholds data managing objects.Nets.pyholds networks.beer2json.pyis an helper script if you happen to have the ratebeer/beeradvocate datasets.
The whole dataset is used to create word embeddings which can be an issue.