Friday May 19, 2023

CVPR 2023 - Masked Image Modeling with Local Multi-Scale Reconstruction

In this episode we discuss Masked Image Modeling with Local Multi-Scale Reconstruction by Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han. The paper proposes a new self-supervised representation learning approach called Masked Image Modeling (MIM) that achieves outstanding success, but with a huge computational burden and slow learning process. To address this, the paper proposes a design that applies MIM to multiple local layers, including lower and upper layers, to explicitly guide them. The approach also facilitates multi-scale semantic understanding by reconstructing fine and coarse-scale supervision signals. This approach achieves comparable or better performance on classification, detection, and segmentation tasks than existing MIM models, with significantly less pre-training burden. Code is available with both MindSpore and PyTorch.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125