Friday Dec 08, 2023
arxiv preprint - OneLLM: One Framework to Align All Modalities with Language
In this episode we discuss OneLLM: One Framework to Align All Modalities with Language by Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue. The paper introduces OneLLM, a multimodal large language model that unifies the encoding of eight different modalities to language via a single framework. It uses a new image projection module and a universal projection module for multimodal alignment, extending the model's capability to progressively align more modalities. OneLLM is demonstrated to excel in various multimodal tasks across 25 benchmarks and is supplementarily supported by a specially curated multimodal instruction dataset with 2 million items, with resources accessible online.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.