EE Seminar: Optimal Quantization for Matrix Multiplication

24 במרץ 2025, 13:00 
אולם 011, בניין כיתות חשמל 
EE Seminar: Optimal Quantization for Matrix Multiplication

(The talk will be given in English)

 

Speaker:     Prof. Or Ordentlich

                                School of Computer Science and Enginnering, Hebrew University of Jerusalem

                            

011 hall, Electrical Engineering-Kitot Building‏

Monday, March 24th, 2024

13:00 - 14:00

 

Optimal Quantization for Matrix Multiplication

 

Abstract

The main building block of large language models is matrix multiplication, which is often bottlenecked by the speed of loading these matrices from memory. A possible solution is to trade accuracy for speed by storing the matrices in low precision (“quantizing” them). In recent years a number of quantization algorithms with increasingly better performance were proposed (e.g., SmoothQuant, Brain compression, GPTQ, QuIP, QuIP#, QuaRot, SpinQuant). In this work, we prove an information theoretic lower bound on achievable accuracy of computing matrix product as a function of compression rate (number of bits per matrix entry). We also construct a quantizer (based on nested lattices) achieving this lower bound. Applying our nested lattice scheme for quantizing weights, KV-cache, and activations of Llama-3-8B to 4 bits, yields smaller perplexity than state-of-the-art quantization schemes.

Based on joint work with Yury Polyanskiy, and with Semyon Savkin and Eitan Porat  

https://arxiv.org/pdf/2410.13780

https://arxiv.org/pdf/2502.09720 

Short Bio

Or Ordentlich is an associate professor in the School of Computer Science and Engineering at the Hebrew University of Jerusalem. His research focuses on information theory, and its application to modern problems in communication, compression and data science. Or received the B.Sc. (cum laude), M.Sc. (summa cum laude), and Ph.D. degrees from Tel Aviv University, Israel, in 2010, in 2011, and 2016, respectively, all in electrical engineering. During the years 2015-2017 he was a postdoctoral fellow in the Laboratory for Information and Decision Systems at the Massachusetts Institute of Technology (MIT), and in the Department of Electrical and Computer Engineering at Boston University. He has been serving as an associate editor for Signal Processing and Source Coding in the IEEE Transactions on Information Theory since 2021.

 

השתתפות בסמינר תיתן קרדיט שמיעה = עפ"י רישום שם מלא + מספר ת.ז. בטופס הנוכחות שיועבר באולם במהלך הסמינר

 

 

אוניברסיטת תל אביב עושה כל מאמץ לכבד זכויות יוצרים. אם בבעלותך זכויות יוצרים בתכנים שנמצאים פה ו/או השימוש שנעשה בתכנים אלה לדעתך מפר זכויות
שנעשה בתכנים אלה לדעתך מפר זכויות נא לפנות בהקדם לכתובת שכאן >>