Saturday Sep 02, 2023

arxiv Preprint - LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

In this episode we discuss LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models by Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. The paper introduces LM-Infinite as a solution to the challenge of length generalization in Large Language Models (LLMs). Existing methods for handling longer sequences are resource-intensive and time-consuming, prompting the need for a simpler approach. LM-Infinite employs a Λ-shaped attention mask and a distance limit, requiring no parameter updates or learning. The technique consistently generates fluent texts and performs well on downstream tasks even with input sequences exceeding training lengths.

Comment (0)

No comments yet. Be the first to say something!