Why do Overparameterized Neural Networks Generalize?

Classical statistical theory predicts that as model complexity increases, the model can risk perfectly interpolating the training data which can lead to overfitting. This is based on a well known Foundational Generalization theory i.e. Bias-Variance Tradeoff. Yet, in modern deep neural networks, overparameterization defies this prediction. In deep learning, overparameterization as expected achieve near-zero training error but surprisingly they still generalize impressively. This thesis evolves our idea of generalization from classical frameworks such as VC (Vapnik Chervonenkis) dimension, PAC (Probably approximately Correct) learning, and explicit regularization to modern explanations like implicit regularization, flat minima, NTK (Neural Tangent Kernels), PAC-Bayes theory, Information bottleneck, and double-descent phenomenon and tries to bridge the gap between them. By synthesizing theoretical and empirical insights, this thesis investigates how classical measures of model capacity fail to capture the geometric and dynamic features of deep learning. Furthermore, this thesis discusses the practical and experimental challenges faced by industry scale models emphasizing the constraints faced by modern deep learning research. The thesis will conclude by outlining open theoretical challenges and suggesting the future work toward a unified generalization theory for deep learning.

Why do Overparameterized Neural Networks Generalize?

SINGLA, KAVITA

2024/2025

Abstract

Classical statistical theory predicts that as model complexity increases, the model can risk perfectly interpolating the training data which can lead to overfitting. This is based on a well known Foundational Generalization theory i.e. Bias-Variance Tradeoff. Yet, in modern deep neural networks, overparameterization defies this prediction. In deep learning, overparameterization as expected achieve near-zero training error but surprisingly they still generalize impressively. This thesis evolves our idea of generalization from classical frameworks such as VC (Vapnik Chervonenkis) dimension, PAC (Probably approximately Correct) learning, and explicit regularization to modern explanations like implicit regularization, flat minima, NTK (Neural Tangent Kernels), PAC-Bayes theory, Information bottleneck, and double-descent phenomenon and tries to bridge the gap between them. By synthesizing theoretical and empirical insights, this thesis investigates how classical measures of model capacity fail to capture the geometric and dynamic features of deep learning. Furthermore, this thesis discusses the practical and experimental challenges faced by industry scale models emphasizing the constraints faced by modern deep learning research. The thesis will conclude by outlining open theoretical challenges and suggesting the future work toward a unified generalization theory for deep learning.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				INGEGNERIA DELL'INFORMAZIONE Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Why do Overparameterized Neural Networks Generalize?
			
	Abstract in italiano
	
				Classical statistical theory predicts that as model complexity increases, the model can risk perfectly interpolating the training data which can lead to overfitting. This is based on a well known Foundational Generalization theory i.e. Bias-Variance Tradeoff. Yet, in modern deep neural networks, overparameterization defies this prediction. In deep learning, overparameterization as expected achieve near-zero training error but surprisingly they still generalize impressively. This thesis evolves our idea of generalization from classical frameworks such as VC (Vapnik Chervonenkis) dimension, PAC (Probably approximately Correct) learning, and explicit regularization to modern explanations like implicit regularization, flat minima, NTK (Neural 
Tangent Kernels), PAC-Bayes theory, Information bottleneck, and double-descent phenomenon and tries to bridge the gap between them.  
By synthesizing theoretical and empirical insights, this thesis investigates how classical measures of model capacity fail to capture the geometric and dynamic features of deep learning. Furthermore, this thesis discusses the practical and experimental challenges faced by industry scale models emphasizing the constraints faced by modern deep learning research. The thesis will conclude by outlining open theoretical challenges and suggesting the future work toward a unified generalization theory for deep learning.
			
	Parola chiave
	
				Overparameterization
Double desent
PAC-Bayes bounds
Learning curves
VC Dimension
			
	Relatore
	
				SCHENATO, LUCA
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
singla_kavita.pdf accesso aperto Dimensione 644.54 kB Formato Adobe PDF Visualizza/Apri	644.54 kB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/97713