Monday Jul 10, 2023

arxiv preprint - LongNet: Scaling Transformers to 1,000,000,000 Tokens

In this episode we discuss LongNet: Scaling Transformers to 1,000,000,000 Tokens by Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Furu Wei. The paper introduces LONGNET, a variant of the Transformer model that addresses the challenge of scaling sequence length in large language models. LONGNET utilizes dilated attention to exponentially expand the attentive field as the distance between tokens grows, offering advantages such as linear computation complexity, logarithmic dependency between tokens, and the ability to serve as a distributed trainer for extremely long sequences. Experimental results demonstrate that LONGNET performs well on long-sequence modeling and general language tasks, allowing for the modeling of very long sequences like entire corpora or the entire Internet.

Comments (0)

To leave or reply to comments, please download free Podbean or

No Comments

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125