Thursday May 25, 2023

CVPR 2023 - Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training

In this episode we discuss Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training by Filip Radenovic, Abhimanyu Dubey, Abhishek Kadian, Todor Mihaylov, Simon Vandenhende, Yash Patel, Yi Wen, Vignesh Ramanathan, Dhruv Mahajan. The paper discusses improvements to the contrastive pre-training pipeline for vision-language models used in zero-shot recognition problems. The authors propose a filtering strategy called CAT to reduce dataset size, an approach called Concept Distillation to leverage strong unimodal representations and modify the traditional contrastive alignment objective with an importance-sampling approach to up-sample the importance of hard-negatives without adding complexity. Their Distilled and Hard-negative Training (DiHT) approach improves performance on 20 tasks in a zero-shot benchmark of 29 tasks and bridges the gap between zero-shot and few-shot performance in linear probing. Demo code is available on GitHub.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125