Tuesday Jul 25, 2023
arxiv preprint - Retentive Network: A Successor to Transformer for Large Language Models
In this episode we discuss Retentive Network: A Successor to Transformer for Large Language Models by Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei. The paper introduces RETNET as a successor to the Transformer architecture for language models. RETNET utilizes a retention mechanism that supports parallel, recurrent, and chunkwise recurrent computation paradigms for efficient training and inference. Experimental results show that RETNET achieves favorable scaling, parallel training, low-cost deployment, and efficient inference, making it a promising candidate for large language models.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.