Friday May 12, 2023

CVPR 2023 - Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

In this episode we discuss Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models by Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello. The paper presents ODISE, a model that unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. The approach leverages the frozen internal representations of both models to outperform the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks. They achieve 23.4 PQ and 30.0 mIoU on the ADE20K dataset with 8.3 PQ and 7.9 mIoU absolute improvement over the previous state of the art. The code and models are open-sourced.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125