Tuesday Aug 20, 2024
arxiv preprint - JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
In this episode, we discuss JPEG-LM: LLMs as Image Generators with Canonical Codec Representations by Xiaochuang Han, Marjan Ghazvininejad, Pang Wei Koh, Yulia Tsvetkov. The paper introduces a novel approach for image and video generation by modeling them as compressed files using standard codecs like JPEG and AVC/H.264. Instead of pixel-based or vector quantization methods, the authors employ the Llama architecture to directly output the compressed bytes, showing improved performance and simplicity. This method achieves a significant reduction in FID and excels in generating long-tail visual elements, highlighting its potential for seamless integration into multimodal systems.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.