Friday May 26, 2023

CVPR 2023 - StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

In this episode we discuss StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos by Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson. The paper introduces StepFormer, a self-supervised model that locates key-steps in instructional videos with no human supervision. Traditional methods require video-level human annotations, which do not scale to large datasets. StepFormer uses automatically-generated subtitles as the only source of supervision and a sequence of text narrations using an order-aware loss function that filters out irrelevant phrases. The model outperforms all previous unsupervised and weakly-supervised approaches on step detection and localization and demonstrates an emergent property to solve zero-shot multi-step localization.

Comment (0)

No comments yet. Be the first to say something!