Sunday May 28, 2023

CVPR 2023 - Stare at What You See: Masked Image Modeling without Reconstruction

In this episode we discuss Stare at What You See: Masked Image Modeling without Reconstruction by Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo. The paper proposes a new approach to Masked Image Modeling (MIM) called MaskAlign. The authors argue that the features extracted by powerful teacher models already contain rich semantic correlations across regions in an intact image, eliminating the need for reconstruction. MaskAlign learns the consistency of visible patch features extracted by the student model and intact image features extracted by the teacher model, and uses a Dynamic Alignment (DA) module to tackle input inconsistency between them. The proposed approach achieves state-of-the-art performance with higher efficiency and is available on GitHub.

Comment (0)

No comments yet. Be the first to say something!