Thursday Sep 14, 2023

arxiv Preprint - eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

In this episode we discuss eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models by Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal. The paper proposes eDKM, a memory-efficient implementation of weight clustering for large language models (LLMs). LLMs have high performance but require compression for storage-limited devices. The eDKM technique reduces the memory footprint of Differentiable KMeans Clustering (DKM) by orders of magnitudes, allowing for efficient LLM compression with good accuracy.

Comment (0)

No comments yet. Be the first to say something!