Information Theoretic Analysis of Deep Neural Networks

We tackle the task of learning an objective function that characterizes a deep, non-linear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacher-student framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an information-theoretical analysis of deep neural networks, building upon established results on two-layer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce two-layer networks to simpler one-layer networks, commonly referred to as generalized linear models. Remarkably, fundamental information-theoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayes-optimal generalization error, are well-understood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent one-layer neural network. The recursion continues until we identify an equivalent one-layer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.

Information Theoretic Analysis of Deep Neural Networks

BERGAMIN, ELEONORA

2023/2024

Abstract

We tackle the task of learning an objective function that characterizes a deep, non-linear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacher-student framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an information-theoretical analysis of deep neural networks, building upon established results on two-layer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce two-layer networks to simpler one-layer networks, commonly referred to as generalized linear models. Remarkably, fundamental information-theoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayes-optimal generalization error, are well-understood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent one-layer neural network. The recursion continues until we identify an equivalent one-layer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Fisica e Astronomia "Galileo Galilei" - DFA
			
	Corso di studio
	
				PHYSICS OF DATA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Information Theoretic Analysis of Deep Neural Networks
			
	Abstract in italiano
	
				We tackle the task of learning an objective function that characterizes a deep, non-linear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacher-student framework, where the data generated by the teacher network are classified by a student network with an identical architecture.
Our main goal is to carry out an information-theoretical analysis of deep neural networks, building upon established results on two-layer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce two-layer networks to simpler one-layer networks, commonly referred to as generalized linear models. Remarkably, fundamental information-theoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayes-optimal generalization error, are well-understood for such simplified networks.
Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent one-layer neural network. The recursion continues until we identify an equivalent one-layer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.
			
	Parola chiave
	
				Machine Learning
Disordered Systems
Neural Networks
Information Theory
Statistical Physics
			
	Relatore
	
				ALLEGRA, MICHELE
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Bergamin_Eleonora.pdf accesso aperto Dimensione 1.12 MB Formato Adobe PDF Visualizza/Apri	1.12 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/66538