Saturday Dec 02, 2023

arxiv preprint - MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

In this episode we discuss MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training by Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel. The paper introduces MobileCLIP, a new efficient image-text model family optimized for mobile devices with a novel multi-modal reinforced training method that enhances accuracy without increasing on-device computational demands. MobileCLIP achieves better latency-accuracy trade-offs in zero-shot classification and retrieval tasks and outperforms existing models in speed and accuracy. The reinforced training method improves learning efficiency by factors of 10 to 1000 times, demonstrated by advancements in a CLIP model with a ViT-B/16 image backbone across 38 benchmarks.

Comment (0)

No comments yet. Be the first to say something!