EE Seminar: Optimal Quantization for Matrix Multiplication
(The talk will be given in English)
Speaker: Prof. Or Ordentlich
School of Computer Science and Enginnering, Hebrew University of Jerusalem
011 hall, Electrical Engineering-Kitot Building |
Monday, March 24th, 2024
13:00 - 14:00
|
Optimal Quantization for Matrix Multiplication
Abstract
The main building block of large language models is matrix multiplication, which is often bottlenecked by the speed of loading these matrices from memory. A possible solution is to trade accuracy for speed by storing the matrices in low precision (“quantizing” them). In recent years a number of quantization algorithms with increasingly better performance were proposed (e.g., SmoothQuant, Brain compression, GPTQ, QuIP, QuIP#, QuaRot, SpinQuant). In this work, we prove an information theoretic lower bound on achievable accuracy of computing matrix product as a function of compression rate (number of bits per matrix entry). We also construct a quantizer (based on nested lattices) achieving this lower bound. Applying our nested lattice scheme for quantizing weights, KV-cache, and activations of Llama-3-8B to 4 bits, yields smaller perplexity than state-of-the-art quantization schemes.
Based on joint work with Yury Polyanskiy, and with Semyon Savkin and Eitan Porat
https://arxiv.org/pdf/2410.13780
https://arxiv.org/pdf/2502.09720
Short Bio
Or Ordentlich is an associate professor in the School of Computer Science and Engineering at the Hebrew University of Jerusalem. His research focuses on information theory, and its application to modern problems in communication, compression and data science. Or received the B.Sc. (cum laude), M.Sc. (summa cum laude), and Ph.D. degrees from Tel Aviv University, Israel, in 2010, in 2011, and 2016, respectively, all in electrical engineering. During the years 2015-2017 he was a postdoctoral fellow in the Laboratory for Information and Decision Systems at the Massachusetts Institute of Technology (MIT), and in the Department of Electrical and Computer Engineering at Boston University. He has been serving as an associate editor for Signal Processing and Source Coding in the IEEE Transactions on Information Theory since 2021.
השתתפות בסמינר תיתן קרדיט שמיעה = עפ"י רישום שם מלא + מספר ת.ז. בטופס הנוכחות שיועבר באולם במהלך הסמינר