Monday Jun 17, 2024
arxiv preprint - Transformers need glasses! Information over-squashing in language tasks
In this episode, we discuss Transformers need glasses! Information over-squashing in language tasks by Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João G. M. Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković. The paper explores how information propagates in decoder-only Transformers, revealing a phenomenon where different input sequences can result in nearly identical final token representations. This issue, worsened by low-precision floating-point formats, impairs the model’s ability to distinguish between these sequences, leading to errors in specific tasks. The authors provide theoretical and empirical evidence of this problem and suggest simple solutions to mitigate it.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.