Thursday Jun 01, 2023
CVPR 2023 - OmniMAE: Single Model Masked Pretraining on Images and Videos
In this episode we discuss OmniMAE: Single Model Masked Pretraining on Images and Videos by Authors: - Rohit Girdhar - Alaaeldin El-Nouby - Mannat Singh - Kalyan Vasudev Alwala - Armand Joulin - Ishan Misra Affiliation: - FAIR, Meta AI. The paper discusses how a common architecture can be used to train a single unified model for multiple visual modalities, namely images and videos, using masked autoencoding. The proposed vision transformer model achieves comparable or better visual representations than single-modality representations on both image and video benchmarks, without requiring any labeled data. Additionally, the model can be trained efficiently by dropping a large proportion of image and video patches. The proposed model achieves new state-of-the-art performance on the ImageNet and Something Something-v2 video benchmarks.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.