Sunday Aug 27, 2023
arxiv Preprint - EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
In this episode we discuss EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding by Karttikeya Mangalam, Raiymbek Akshulakov, Jitendra Malik. The paper presents EgoSchema, a benchmark dataset and evaluation metric for assessing the long-form video language understanding capabilities of vision and language systems. The dataset consists of over 5000 multiple choice question-answer pairs based on 250 hours of real video data, and the questions require selecting the correct answer from five options based on a three-minute video clip. The authors highlight that existing video understanding datasets lack long temporal structures, and they show that state-of-the-art video and language models have limitations in long-term video understanding.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.