
Saturday May 06, 2023
CVPR 2023 - Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
In this episode we discuss Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP by Authors: - Feng Liang - Bichen Wu - Xiaoliang Dai - Kunpeng Li - Yinan Zhao - Hang Zhang - Peizhao Zhang - Peter Vajda - Diana Marculescu Affiliations: - Feng Liang and Diana Marculescu are affiliated with The University of Texas at Austin. - Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Peizhao Zhang, Peter Vajda are affiliated with Meta Reality Labs. - Hang Zhang is affiliated with Cruise.. The paper proposes a method to improve the performance of open-vocabulary semantic segmentation, which involves segmenting an image into semantic regions according to text descriptions that may not have been seen during training. The current two-stage approach involves generating class-agnostic mask proposals and then using pre-trained vision-language models like CLIP to classify masked regions. However, the authors identify the bottleneck of this approach to be the pre-trained CLIP model, which doesn't perform well on masked images. To address this issue, they propose fine-tuning CLIP on a collection of masked image regions and their corresponding text descriptions, collected by mining an existing image-caption dataset. They also use a method called "mask prompt tuning" to utilize the "blank" areas in masked images. The authors demonstrate that their method achieves significant improvement over the previous state-of-the-art on the ADE20K-150 dataset.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.