This repository was archived by the owner on Feb 8, 2018. It is now read-only.

cedias / HAN-pytorch Public archive

Notifications You must be signed in to change notification settings
Fork 12
Star 48

(Deprecated) Hierarchical Attention Networks for Document Classification (https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf) - in Pytorch

48 stars 12 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Data.py		Data.py
Nets.py		Nets.py
README.md		README.md
main.py		main.py
prepare_data.py		prepare_data.py

Repository files navigation

Deprecated code

A faster and up to date implementation is in my other repo

HAN-pytorch

Batched implementation of Hierarchical Attention Networks for Document Classification paper

Requirements

Pytorch (>= 0.2)
Spacy (for tokenizing)
Gensim (for building word vectors)
tqdm (for fancy graphics)

Scripts:

prepare_data.py transforms gzip files as found on Julian McAuley Amazon product data page to lists of (user,item,review,rating) tuples and builds word vectors if --create-emb option is specified.
main.py trains a Hierarchical Model.
Data.py holds data managing objects.
Nets.py holds networks.
beer2json.py is an helper script if you happen to have the ratebeer/beeradvocate datasets.

Note:

The whole dataset is used to create word embeddings which can be an issue.