Wednesday Sep 06, 2023
ICCV 2023 - Verbs in Action: Improving verb understanding in video-language models
In this episode we discuss Verbs in Action: Improving verb understanding in video-language models by Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid. The paper proposes a Verb-Focused Contrastive (VFC) framework to address the limited understanding of verbs in video-language models. The framework utilizes pre-trained large language models (LLMs) to generate hard negative captions by changing only the verb while keeping the context intact. The method achieves state-of-the-art results in zero-shot performance on three downstream tasks: video-text matching, video question-answering, and video classification.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.