Classical statistical theory predicts that as model complexity increases, the model can risk perfectly interpolating the training data which can lead to overfitting. This is based on a well known Foundational Generalization theory i.e. Bias-Variance Tradeoff. Yet, in modern deep neural networks, overparameterization defies this prediction. In deep learning, overparameterization as expected achieve near-zero training error but surprisingly they still generalize impressively. This thesis evolves our idea of generalization from classical frameworks such as VC (Vapnik Chervonenkis) dimension, PAC (Probably approximately Correct) learning, and explicit regularization to modern explanations like implicit regularization, flat minima, NTK (Neural Tangent Kernels), PAC-Bayes theory, Information bottleneck, and double-descent phenomenon and tries to bridge the gap between them. By synthesizing theoretical and empirical insights, this thesis investigates how classical measures of model capacity fail to capture the geometric and dynamic features of deep learning. Furthermore, this thesis discusses the practical and experimental challenges faced by industry scale models emphasizing the constraints faced by modern deep learning research. The thesis will conclude by outlining open theoretical challenges and suggesting the future work toward a unified generalization theory for deep learning.
Classical statistical theory predicts that as model complexity increases, the model can risk perfectly interpolating the training data which can lead to overfitting. This is based on a well known Foundational Generalization theory i.e. Bias-Variance Tradeoff. Yet, in modern deep neural networks, overparameterization defies this prediction. In deep learning, overparameterization as expected achieve near-zero training error but surprisingly they still generalize impressively. This thesis evolves our idea of generalization from classical frameworks such as VC (Vapnik Chervonenkis) dimension, PAC (Probably approximately Correct) learning, and explicit regularization to modern explanations like implicit regularization, flat minima, NTK (Neural Tangent Kernels), PAC-Bayes theory, Information bottleneck, and double-descent phenomenon and tries to bridge the gap between them. By synthesizing theoretical and empirical insights, this thesis investigates how classical measures of model capacity fail to capture the geometric and dynamic features of deep learning. Furthermore, this thesis discusses the practical and experimental challenges faced by industry scale models emphasizing the constraints faced by modern deep learning research. The thesis will conclude by outlining open theoretical challenges and suggesting the future work toward a unified generalization theory for deep learning.
Why do Overparameterized Neural Networks Generalize?
SINGLA, KAVITA
2024/2025
Abstract
Classical statistical theory predicts that as model complexity increases, the model can risk perfectly interpolating the training data which can lead to overfitting. This is based on a well known Foundational Generalization theory i.e. Bias-Variance Tradeoff. Yet, in modern deep neural networks, overparameterization defies this prediction. In deep learning, overparameterization as expected achieve near-zero training error but surprisingly they still generalize impressively. This thesis evolves our idea of generalization from classical frameworks such as VC (Vapnik Chervonenkis) dimension, PAC (Probably approximately Correct) learning, and explicit regularization to modern explanations like implicit regularization, flat minima, NTK (Neural Tangent Kernels), PAC-Bayes theory, Information bottleneck, and double-descent phenomenon and tries to bridge the gap between them. By synthesizing theoretical and empirical insights, this thesis investigates how classical measures of model capacity fail to capture the geometric and dynamic features of deep learning. Furthermore, this thesis discusses the practical and experimental challenges faced by industry scale models emphasizing the constraints faced by modern deep learning research. The thesis will conclude by outlining open theoretical challenges and suggesting the future work toward a unified generalization theory for deep learning.| File | Dimensione | Formato | |
|---|---|---|---|
|
singla_kavita.pdf
accesso aperto
Dimensione
644.54 kB
Formato
Adobe PDF
|
644.54 kB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/97713