Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)
video-captioning automatic-annotation video-grounding vision-language large-scale-pretraining video-language-pretrainng video-language-model
-
Updated
Sep 2, 2025 - Python