Wednesday Sep 06, 2023

ICCV 2023 - Verbs in Action: Improving verb understanding in video-language models

In this episode we discuss Verbs in Action: Improving verb understanding in video-language models by Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid. The paper proposes a Verb-Focused Contrastive (VFC) framework to address the limited understanding of verbs in video-language models. The framework utilizes pre-trained large language models (LLMs) to generate hard negative captions by changing only the verb while keeping the context intact. The method achieves state-of-the-art results in zero-shot performance on three downstream tasks: video-text matching, video question-answering, and video classification.

Comment (0)

No comments yet. Be the first to say something!