סמינר של הפקולטה להנדסה ע"ש איבי ואלדר פליישמן

EE Seminar: Sparsity and over-parameterization in deep neural networks: Theory and Applications

27 במאי 2026, 15:00

אולם 011, בניין כיתות-חשמל

EE Seminar: Sparsity and over-parameterization in deep neural networks: Theory and Applications

Electrical Engineering Systems Seminar

Speaker: Yehonathan Refael

Ph.D. student under the supervision of Dr. Wasim Huleihel and Dr. Ofir Lindenbaum

Wednesday, 27^th May 2026, at 15:00

Room 011, Kitot Building, Faculty of Engineering

Sparsity and over-parameterization in deep neural networks: Theory and Applications

Abstract

In recent years, deep learning has revolutionized numerous fields, driven by a persistent and accelerating trend toward massive scale. Modern Deep Neural Networks particularly large language models and high-resolution computer vision architectures—are routinely designed with billions of parameters. This architectural paradigm situates contemporary deep learning firmly within a highly over-parameterized regime, where the number of trainable parameters vastly exceeds available training samples. Under classical statistical learning frameworks, such as those governed by Vapnik-Chervonenkis dimensions and the traditional bias-variance tradeoff, models with such extreme capacity should inevitably memorize stochastic noise, resulting in catastrophic generalization failures on unseen data.

However, empirical evidence consistently contradicts this classical intuition. Highly over-parameterized models routinely exhibit "benign overfitting," wherein they possess the representational power to perfectly interpolate training data—even data corrupted with randomized labels—yet still maintain extraordinary predictive accuracy on test datasets. This paradox indicates that the success of deep neural networks cannot be explained by worst-case learnability theories; rather, it demands a profound reexamination of the optimization dynamics that govern how these models traverse their complex loss landscapes.

Despite these favorable generalization properties, the massive scale of these models introduces profound practical challenges. Arguably the most painful bottleneck in large neural networks is their prohibitive training and adaptation cost. As parameter counts soar, the computational overhead, memory footprint, and energy consumption required for both pre-training and fine-tuning become increasingly unsustainable. The necessity to store massive optimizer states and gradients makes the adaptation of foundational models to downstream tasks computationally formidable, necessitating a fundamental shift from brute-force scaling to more efficient paradigms that alleviate these costs without sacrificing expressive power.

The research presented in this talk establishes that the extreme redundancy provided by over-parameterization does not lead to chaotic, high-complexity solutions. Instead, it systematically induces highly structured, low-rank, and sparse properties within the network's weights and gradients. By explicitly leveraging these theoretically grounded properties, we develop algorithms that dramatically accelerate training, reduce memory constraints, enhance model generalization, and rigorously characterize inherent privacy and security vulnerabilities. Ultimately, these theoretical foundations provide the necessary tools to scale deep learning systems beyond current hardware limitations, ensuring the next generation of artificial intelligence is fundamentally more efficient, generalizable, and secure.

-סמינר זה ייחשב כסמינר שמיעה לתלמידי תואר שני ושלישי-

This Seminar Is Considered A Hearing Seminar For Msc/Phd Students-

הרישום לסמינר יבוצע בתחילת הסמינר באמצעות סריקת הברקוד למודל (יש להיכנס לפני כן למודל, לא באמצעות האפליקציה)

Registration to the seminar is done at the beginning of the seminar by scanning the barcode for the Moodle (Please enter ahead to the Moodle, NOT by application)

קישורים נוספים