Thursday May 25, 2023
CVPR 2023 - Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
In this episode we discuss Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training by Filip Radenovic, Abhimanyu Dubey, Abhishek Kadian, Todor Mihaylov, Simon Vandenhende, Yash Patel, Yi Wen, Vignesh Ramanathan, Dhruv Mahajan. The paper discusses improvements to the contrastive pre-training pipeline for vision-language models used in zero-shot recognition problems. The authors propose a filtering strategy called CAT to reduce dataset size, an approach called Concept Distillation to leverage strong unimodal representations and modify the traditional contrastive alignment objective with an importance-sampling approach to up-sample the importance of hard-negatives without adding complexity. Their Distilled and Hard-negative Training (DiHT) approach improves performance on 20 tasks in a zero-shot benchmark of 29 tasks and bridges the gap between zero-shot and few-shot performance in linear probing. Demo code is available on GitHub.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.