Friday May 26, 2023

CVPR 2023 - StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

In this episode we discuss StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos by Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson. The paper introduces StepFormer, a self-supervised model that locates key-steps in instructional videos with no human supervision. Traditional methods require video-level human annotations, which do not scale to large datasets. StepFormer uses automatically-generated subtitles as the only source of supervision and a sequence of text narrations using an order-aware loss function that filters out irrelevant phrases. The model outperforms all previous unsupervised and weakly-supervised approaches on step detection and localization and demonstrates an emergent property to solve zero-shot multi-step localization.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125