We tackle the task of learning an objective function that characterizes a deep, nonlinear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacherstudent framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an informationtheoretical analysis of deep neural networks, building upon established results on twolayer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce twolayer networks to simpler onelayer networks, commonly referred to as generalized linear models. Remarkably, fundamental informationtheoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayesoptimal generalization error, are wellunderstood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent onelayer neural network. The recursion continues until we identify an equivalent onelayer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.
We tackle the task of learning an objective function that characterizes a deep, nonlinear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacherstudent framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an informationtheoretical analysis of deep neural networks, building upon established results on twolayer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce twolayer networks to simpler onelayer networks, commonly referred to as generalized linear models. Remarkably, fundamental informationtheoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayesoptimal generalization error, are wellunderstood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent onelayer neural network. The recursion continues until we identify an equivalent onelayer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.
Information Theoretic Analysis of Deep Neural Networks
BERGAMIN, ELEONORA
2023/2024
Abstract
We tackle the task of learning an objective function that characterizes a deep, nonlinear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacherstudent framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an informationtheoretical analysis of deep neural networks, building upon established results on twolayer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce twolayer networks to simpler onelayer networks, commonly referred to as generalized linear models. Remarkably, fundamental informationtheoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayesoptimal generalization error, are wellunderstood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent onelayer neural network. The recursion continues until we identify an equivalent onelayer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.File  Dimensione  Formato  

Bergamin_Eleonora.pdf
accesso aperto
Dimensione
1.12 MB
Formato
Adobe PDF

1.12 MB  Adobe PDF  Visualizza/Apri 
The text of this website © Università degli studi di Padova. Full Text are published under a nonexclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/66538