Wednesday May 17, 2023

CVPR 2023 - Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel

In this episode we discuss Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel by Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang. The paper discusses the challenge of multi-channel video-language retrieval, which requires models to understand information from different sources such as video and text. The authors investigate different options for representing videos and fusing video and text information using a principled model design space. The evaluation of four combinations on five video-language datasets reveals that discrete text tokens with a pretrained contrastive text model perform the best, even outperforming state-of-the-art models on some datasets. The authors attribute this to the ability of text tokens to capture key visual information and align naturally with strong text retrieval models.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125