
Friday May 12, 2023
CVPR 2023 - Learning to Name Classes for Vision and Language Models
In this episode we discuss Learning to Name Classes for Vision and Language Models by Sarah Parisot, Yongxin Yang, Steven McDonagh. The paper proposes a solution to two challenges faced by large-scale vision and language models in achieving impressive zero-shot recognition performances. These challenges include sensitivity to handcrafted class names defining queries and difficulty in adapting to new, smaller datasets. The proposed solution suggests learning optimal word embeddings for each class as a function of visual content to retain zero-shot capabilities for new classes, adapt models to new datasets, and adjust potentially erroneous or ambiguous class names. The solution is shown to yield significant performance gains in multiple scenarios and provides insights into model biases and labeling errors.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.