Friday Apr 05, 2024
arxiv preprint - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
In this episode, we discuss Mixture-of-Depths: Dynamically allocating compute in transformer-based language models by David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro. The study presents a method for transformers that allows for the dynamic allocation of computational resources within sequences by limiting the number of tokens processed at each layer using a top-k routing mechanism. This approach maintains a fixed tensor size and a static computation graph, which differs from other conditional computation strategies. The resulting model operates with fewer computations per forward pass and provides up to a 50% faster sampling rate post-training, while still matching the performance of baseline models with the same computational budget and training duration.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.