
Friday Feb 14, 2025
Arxiv paper - VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
In this episode, we discuss VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection by Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu. The paper introduces VideoEspresso, a high-quality, large-scale VideoQA dataset that maintains essential spatial and temporal details and includes multimodal annotations for intermediate reasoning steps. Utilizing a semantic-aware construction pipeline and GPT-4 for generating QA pairs and Chain-of-Thought annotations, the dataset enhances scalability and reasoning complexity. Additionally, the authors propose a Hybrid LVLMs Collaboration framework that outperforms existing models on 14 tasks, demonstrating superior video reasoning capabilities.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.