Monday Jul 31, 2023

CVPR 2023 - LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

In this episode we discuss LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling by Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang. The paper presents LAVENDER, a unified video-language framework that uses Masked Language Modeling (MLM) as the common interface for pre-training and downstream tasks. LAVENDER simplifies the model architecture by using a lightweight MLM head on top of the multimodal encoder. Surprisingly, experimental results show that LAVENDER achieves competitive performance on various video-language benchmarks.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125