Tuesday Oct 17, 2023

ICCV 2023 - Sigmoid Loss for Language Image Pre-Training

In this episode we discuss Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. The paper introduces a pairwise Sigmoid loss for Language-Image Pre-training (SigLIP), which operates on image-text pairs and allows for scaling up batch size without the need for global pairwise similarities. By combining SigLIP with Locked-image Tuning, the authors achieve high ImageNet zero-shot accuracy in just two days of training. The authors also discuss the impact of batch size and find that a batch size of 32k is sufficient.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125