Saturday May 06, 2023

CVPR 2023 - Align and Attend: Multimodal Summarization with Dual Contrastive Losses

In this episode we discuss Align and Attend: Multimodal Summarization with Dual Contrastive Losses by Authors: - Bo He - Jun Wang - Jielin Qiu - Trung Bui - Abhinav Shrivastava - Zhaowen Wang Affiliations: - Bo He, Jun Wang, and Abhinav Shrivastava: University of Maryland, College Park - Jielin Qiu: Carnegie Mellon University - Trung Bui and Zhaowen Wang: Adobe Research. The paper proposes a new approach called Align and Attend Multimodal Summarization (A2Summ) for extracting important information from multiple modalities to create reliable summaries. It introduces a unified transformer-based model that aligns and attends to the multimodal input, while also addressing the issue of ignoring temporal correspondence between different modalities and intrinsic correlation between different samples. The proposed model achieves state-of-the-art performance on standard video summarization and multimodal summarization datasets and the authors also introduce a new large-scale multimodal summarization dataset called BLiSS.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments