Monday May 15, 2023

CVPR 2023 - Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

In this episode we discuss Query-Dependent Video Representation for Moment Retrieval and Highlight Detection by WonJun Moon, Sangeek Hyun, SangUk Park, Dongchan Park, Jae-Pil Heo. The paper introduces Query-Dependent DETR (QD-DETR), a detection transformer model designed for video moment retrieval and highlight detection (MR/HD). The previous transformer-based models did not exploit the information of a given query, neglecting the relevance between the text query and video contents. QD-DETR addresses this issue by introducing cross-attention layers to inject the context of the text query into video representation and manipulating video-query pairs to produce irrelevant pairs. Additionally, the paper presents an input-adaptive saliency predictor that adaptively defines the criterion of saliency scores for given video-query pairs. The performance of QD-DETR outperforms state-of-the-art methods on QVHighlights, TVSum, and Charades-STA datasets.

Comment (0)

No comments yet. Be the first to say something!