To download the MF2 dataset into the data/ folder, follow these steps:
-
Make sure Git LFS is installed. Then run:
git lfs install (cd data && git clone https://huggingface.co/datasets/sardinelab/MF2)
It is recommended to use a virtual environment to manage dependencies. You can create one using venv:
python -m venv mf2-env
source mf2-env/bin/activate pip install -r requirements.txtMake sure ffmpeg is installed and the module is loaded. If your system does not use modules, you can install FFmpeg via a package manager.
sudo apt install ffmpeg
module load ffmpegTo run inference with an open-weight model, use the following command:
bash run_open_inference.sh <model> <output_dir> <modality> <prompt_template><model>: Name of the model as defined inmodels/registry.py.<output_dir>: Directory path where results will be saved.<modality>: Input type to be received during evaluation, with options:video_onlytranscripts_onlyvideo_and_transcriptsvideo_and_synopsisvideo_transcripts_and_synopsisstatement_onlysynopsis_only
<prompt_template>: Prompt style to be used during evaluation, with options:explanationexplanation_freedirectdirect_free— recommended for best results with open-weight models.
To run inference with a closed model, use the following command:
bash run_closed_inference.sh <model> <output_dir> <modality> <prompt_template> <api_key><model>: e.g. gpt-4o or gemini-2.5-pro-preview-03-25.<output_dir>: Directory path where results will be saved.<modality>: Input type to be received during evaluation, with options:video_onlytranscripts_onlyvideo_and_transcriptsvideo_and_synopsisvideo_transcripts_and_synopsisstatement_onlysynopsis_only
<prompt_template>: Prompt style to be used during evaluation, with options:explanationexplanation_free— recommended for best results with closed models.directdirect_free
<api_key>: the key to access the closed model.
After running the model, parse the outputs using the following command:
python parse_model_outputs.py --output_dir <output_dir> --strategy <strategy><output_dir>: Directory containing the results from running the model.<strategy>: Parsing strategy to apply, with options:strictfirst-occurrencelast-occurrencestrict-w-fallback-first-occurrence
-
For open-weight models,
first-occurrenceandlast-occurrencecan be pretty much interchangeably used, as most models will output just the answer due to the selection of thedirect_freeprompt. -
For closed models,
last-occurrenceis recommended since the output includes an explanation first before the final result, as the prompt used isexplanation_free.
For questions or support, contact [email protected].
CC-BY-NC-SA 4.0 license. This dataset is provided for non-commercial use only.