Sunday May 28, 2023
CVPR 2023 - Stare at What You See: Masked Image Modeling without Reconstruction
In this episode we discuss Stare at What You See: Masked Image Modeling without Reconstruction by Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo. The paper proposes a new approach to Masked Image Modeling (MIM) called MaskAlign. The authors argue that the features extracted by powerful teacher models already contain rich semantic correlations across regions in an intact image, eliminating the need for reconstruction. MaskAlign learns the consistency of visible patch features extracted by the student model and intact image features extracted by the teacher model, and uses a Dynamic Alignment (DA) module to tackle input inconsistency between them. The proposed approach achieves state-of-the-art performance with higher efficiency and is available on GitHub.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.