We tackle the task of learning an objective function that characterizes a deep, non-linear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacher-student framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an information-theoretical analysis of deep neural networks, building upon established results on two-layer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce two-layer networks to simpler one-layer networks, commonly referred to as generalized linear models. Remarkably, fundamental information-theoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayes-optimal generalization error, are well-understood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent one-layer neural network. The recursion continues until we identify an equivalent one-layer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.

We tackle the task of learning an objective function that characterizes a deep, non-linear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacher-student framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an information-theoretical analysis of deep neural networks, building upon established results on two-layer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce two-layer networks to simpler one-layer networks, commonly referred to as generalized linear models. Remarkably, fundamental information-theoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayes-optimal generalization error, are well-understood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent one-layer neural network. The recursion continues until we identify an equivalent one-layer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.

Information Theoretic Analysis of Deep Neural Networks

BERGAMIN, ELEONORA
2023/2024

Abstract

We tackle the task of learning an objective function that characterizes a deep, non-linear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacher-student framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an information-theoretical analysis of deep neural networks, building upon established results on two-layer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce two-layer networks to simpler one-layer networks, commonly referred to as generalized linear models. Remarkably, fundamental information-theoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayes-optimal generalization error, are well-understood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent one-layer neural network. The recursion continues until we identify an equivalent one-layer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.
2023
Information Theoretic Analysis of Deep Neural Networks
We tackle the task of learning an objective function that characterizes a deep, non-linear neural network, focusing on training the complete set of network parameters. Our investigation is conducted within a scenario where the number of samples, input dimension, and network width are all notably large. The neural networks under study operate in a teacher-student framework, where the data generated by the teacher network are classified by a student network with an identical architecture. Our main goal is to carry out an information-theoretical analysis of deep neural networks, building upon established results on two-layer networks. Recent conjectures, followed by partial rigorous proofs, show that it is possible to reduce two-layer networks to simpler one-layer networks, commonly referred to as generalized linear models. Remarkably, fundamental information-theoretic quantities such as the mutual information between training data and teacher network weights, as well as the Bayes-optimal generalization error, are well-understood for such simplified networks. Consequently, our strategy involves extending this reduction using a recursive argument. This involves progressively simplifying the network by replacing the last two layers with an equivalent one-layer neural network. The recursion continues until we identify an equivalent one-layer model for the entire network. This recursive approach is expected to provide us with a comprehensive understanding of the network’s behavior and performance.
Machine Learning
Disordered Systems
Neural Networks
Information Theory
Statistical Physics
File in questo prodotto:
File Dimensione Formato  
Bergamin_Eleonora.pdf

accesso aperto

Dimensione 1.12 MB
Formato Adobe PDF
1.12 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/66538