Thursday Sep 14, 2023
arxiv Preprint - eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
In this episode we discuss eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models by Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal. The paper proposes eDKM, a memory-efficient implementation of weight clustering for large language models (LLMs). LLMs have high performance but require compression for storage-limited devices. The eDKM technique reduces the memory footprint of Differentiable KMeans Clustering (DKM) by orders of magnitudes, allowing for efficient LLM compression with good accuracy.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.