This repository has been created to enhance collaboration and document the results of the project in the lecture "Neural Networks and Sequence to Sequence Learning". The aim is to implement image captioning using primarily the pytorch machine translation framework JoeyNMT. As baseline model we implement the approach of Xu et al. (2015).
You can let our best-performing model generate captions for any image here.
Just like Xu et al. (2015) we use an encoder network to retrieve features from images. The feature vector is then used to initialize a LSTM decoder, which unrolls a generated caption. For each step, an attention mechanism is applied on the feature vector. The attention mechanism is illustrated below using a real example and shows how our implementation attends to different areas of the image during unrolling.

- Make sure to install dependencies listed in requirements.txt.
- In order to work with our implementation, load the Flickr8k dataset from https://github.com/goodwillyoga/Flickr8k_dataset.
- Place the files
Flickr8k.token.txt,Flickr8k.trainImages.txt,Flickr8k.devImages.txt,Flickr8k.testImages.txt, theExpertAnnotations.txtfile as well as the folder containing all images in adatafolder in project root. - Adapt the location and name of the above mentioned files in train.py, if necessary.
- Create a .yaml file in the
paramfolder in project root. You should give this file a meaningful name. Define in the file all parameters of the experiment you want to execute. An example .yaml file with explanations can be found in theparamfolder of this repository. - Start training:
python train.py modelname. Setmodel_nameto the name given to the .yaml file containing the desired training parameters. - During training, the loss and BLEU score evaluations on the train data will be stored inside a
runsfolder, named accoring to the model name given before. These data points can easily be visualized using Tensorboard. The trained model is stored as a.pthfile in the foldersaved_models.
- Make sure the .pth file of the model you want to evaluate exists in the
saved_modelsfolder. - In
eval.py, change themodel_nameto the name of a .yaml file containing the same entries as the trainining file and additionally the entryload_model, set to the path of the .pth file. Example:load_model: 'saved_models/best_model.pth' - Start evaluating with
python eval.py. Evaluation will be done using the test split and results will be put out to the console.
Our best-performing model's weights can be downloaded here.
