Friday May 26, 2023
CVPR 2023 - StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
In this episode we discuss StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos by Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson. The paper introduces StepFormer, a self-supervised model that locates key-steps in instructional videos with no human supervision. Traditional methods require video-level human annotations, which do not scale to large datasets. StepFormer uses automatically-generated subtitles as the only source of supervision and a sequence of text narrations using an order-aware loss function that filters out irrelevant phrases. The model outperforms all previous unsupervised and weakly-supervised approaches on step detection and localization and demonstrates an emergent property to solve zero-shot multi-step localization.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.